**Isil Dillig Serdar Tasiran (Eds.)**

# LNCS 11562

# **Computer Aided Verification**

**31st International Conference, CAV 2019 New York City, NY, USA, July 15–18, 2019 Proceedings, Part II**

## Lecture Notes in Computer Science 11562

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

#### Editorial Board Members

David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA

More information about this series at http://www.springer.com/series/7407

Isil Dillig • Serdar Tasiran (Eds.)

## Computer Aided Verification

31st International Conference, CAV 2019 New York City, NY, USA, July 15–18, 2019 Proceedings, Part II

Editors Isil Dillig University of Texas Austin, TX, USA

Serdar Tasiran Amazon Web Services New York, NY, USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-25542-8 ISBN 978-3-030-25543-5 (eBook) https://doi.org/10.1007/978-3-030-25543-5

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2019, This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

#### Preface

It was our privilege to serve as the program chairs for CAV 2019, the 31st International Conference on Computer-Aided Verification. CAV 2019 was held in New York, USA, during July 15–18, 2019. The tutorial day was on July 14, 2019, and the pre-conference workshops were held during July 13–14, 2019. All events took place in The New School in New York City.

CAV is an annual conference dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems. The primary focus of CAV is to extend the frontiers of verification techniques by expanding to new domains such as security, quantum computing, and machine learning. This put CAV at the cutting edge of formal methods research, and this year's program is a reflection of this commitment.

CAV 2019 received a very high number of submissions (258). We accepted 13 tool papers, two case studies, and 52 regular papers, which amounts to an acceptance rate of roughly 26%. The accepted papers cover a wide spectrum of topics, from theoretical results to applications of formal methods. These papers apply or extend formal methods to a wide range of domains such as concurrency, learning, and industrially deployed systems. The program featured invited talks by Dawn Song (UC Berkeley), Swarat Chaudhuri (Rice University), and Ken McMillan (Microsoft Research) as well as invited tutorials by Emina Torlak (University of Washington) and Ranjit Jhala (UC San Diego). Furthermore, we continued the tradition of Logic Lounge, a series of discussions on computer science topics targeting a general audience.

In addition to the main conference, CAV 2019 hosted the following workshops: The Best of Model Checking (BeMC) in honor of Orna Grumberg, Design and Analysis of Robust Systems (DARS), Verification Mentoring Workshop (VMW), Numerical Software Verification (NSV), Verified Software: Theories, Tools, and Experiments (VSTTE), Democratizing Software Verification, Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), and Synthesis (SYNT).

Organizing a top conference like CAV requires a great deal of effort from the community. The Program Committee for CAV 2019 consisted of 79 members, a committee of this size ensures that each member has to review a reasonable number of papers in the allotted time. In all, the committee members wrote over 770 reviews while investing significant effort to maintain and ensure the high quality of the conference program. We are grateful to the CAV 2019 Program Committee for their outstanding efforts in evaluating the submissions and making sure that each paper got a fair chance.

Like last year's CAV, we made artifact evaluation mandatory for tool submissions and optional but encouraged for the rest of the accepted papers. The Artifact Evaluation Committee consisted of 27 reviewers who put in significant effort to evaluate each artifact. The goal of this process was to provide constructive feedback to tool developers and help make the research published in CAV more reproducible. The Artifact Evaluation Committee was generally quite impressed by the quality of the artifacts, and, in fact, all accepted tools passed the artifact evaluation. Among regular papers, 65% of the authors submitted an artifact, and 76% of these artifacts passed the evaluation. We are also very grateful to the Artifact Evaluation Committee for their hard work and dedication in evaluating the submitted artifacts.

CAV 2019 would not have been possible without the tremendous help we received from several individuals, and we would like to thank everyone who helped make CAV 2019 a success. First, we would like to thank Yu Feng and Ruben Martins for chairing the Artifact Evaluation Committee and Zvonimir Rakamaric for maintaining the CAV website and social media presence. We also thank Oksana Tkachuk for chairing the workshop organization process, Peter O'Hearn for managing sponsorship, and Thomas Wies for arranging student fellowships. We also thank Loris D'Antoni, Rayna Dimitrova, Cezara Dragoi, and Anthony W. Lin for organizing the Verification Mentoring Workshop and working closely with us. Last but not least, we would like to thank Kostas Ferles, Navid Yaghmazadeh, and members of the CAV Steering Committee (Ken McMillan, Aarti Gupta, Orna Grumberg, and Daniel Kroening) for helping us with several important aspects of organizing CAV 2019.

We hope that you will find the proceedings of CAV 2019 scientifically interesting and thought-provoking!

June 2019 Isil Dillig Serdar Tasiran

## Organization

#### Program Chairs



#### Program Committee


Marc Brockschmidt Microsoft, UK Pavol Cerny University of Colorado Boulder, USA Swarat Chaudhuri Rice University, USA Wei-Ngan Chin National University of Singapore Adam Chlipala Massachusetts Institute of Technology, USA Hana Chockler King's College London, UK Eva Darulova Max Planck Institute for Software Systems, Germany Cristina David University of Cambridge, UK Dana Drachsler Cohen ETH Zurich, Switzerland Cezara Dragoi Inria Paris, ENS, France Constantin Enea IRIF, University of Paris Diderot, France Azadeh Farzan University of Toronto, Canada Grigory Fedyukovich Princeton University, USA Yu Feng University of California, Santa Barbara, USA Dana Fisman Ben-Gurion University, Israel Milos Gligoric The University of Texas at Austin, USA Patrice Godefroid Microsoft, USA Laure Gonnord University of Lyon/Laboratoire d'Informatique du Parallélisme, France Aarti Gupta Princeton University, USA Arie Gurfinkel University of Waterloo, Canada Klaus Havelund Jet Propulsion Laboratory, USA Chris Hawblitzel Microsoft, USA Alan J. Hu The University of British Columbia, Canada Shachar Itzhaky Technion, Israel Franjo Ivancic Google, USA Ranjit Jhala University of California San Diego, USA Rajeev Joshi Automated Reasoning Group, Amazon Web Services, USA Dejan Jovanović SRI International, USA Laura Kovacs Vienna University of Technology, Austria Burcu Kulahcioglu Ozkan MPI-SWS, Germany Marta Kwiatkowska University of Oxford, UK Shuvendu Lahiri Microsoft, USA Akash Lal Microsoft, India Stephen Magill Galois, Inc., USA Joao Marques-Silva Universidade de Lisboa, Portugal Ruben Martins Carnegie Mellon University, USA Ken McMillan Microsoft, USA Vijay Murali Facebook, USA Peter Müller ETH Zurich, Switzerland Mayur Naik Intel, USA Hakjoo Oh Korea University, South Korea Oded Padon Stanford University, USA Corina Pasareanu CMU/NASA Ames Research Center, USA Ruzica Piskac Yale University, USA


#### Artifact Evaluation Committee



#### Mentoring Workshop Organizing Committee


#### Steering Committee


#### Additional Reviewers

Sepideh Asadi Lucas Asadi Haniel Barbosa Ezio Bartocci Sam Bartocci Suda Bharadwaj Erdem Biyik Martin Biyik Timothy Bourke Julien Braine Steven Braine Benjamin Caulfield Eti Chaudhary Xiaohong Chaudhary Yinfang Chen Andreea Costea Murat Costea

Emanuele D'Osualdo Nicolas Dilley Marko Dilley Bruno Dutertre Marco Eilers Cindy Eilers Yotam Feldman Jerome Feret Daniel Feret Mahsa Ghasemi Shromona Ghosh Anthony Ghosh Bernhard Gleiss Shilpi Goel William Goel Mirazul Haque Ludovic Henrio

Andreas Henrio Antti Hyv ärinen Duligur Ibeling Rinat Ibeling Nouraldin Jaber Swen Jacobs Maximilian Jacobs Susmit Jha Anja Karl Jens Karl Sean Kauffman Ayrat Khalimov Bettina Khalimov Hillel Kugler Daniel Larraz Christopher Larraz Wonyeol Lee Matt Lewis Wenchao Lewis Kaushik Mallik Matteo Marescotti David Marescotti Dmitry Mordvinov Matthieu Moy Thanh Toan Moy Victor Nicolet Andres Noetzli Abraham Noetzli Saswat Padhi Karl Palmskog

Rong Palmskog Daejun Park Brandon Paulsen Lucas Paulsen Adi Yoga Prabawa Dhananjay Raju Andrew Raju Heinz Riener Sriram Sankaranarayanan Mark Sankaranarayanan Yagiz Savas Traian Florin Serbanuta Fu Serbanuta Yahui Song Pramod Subramanyan Rob Subramanyan Sol Swords Martin Tappler Ta Quang Tappler Anthony Vandikas Marcell Vazquex-Chanlatte Yuke Vazquex-Chanlatte Min Wen Josef Widder Bo Widder Haoze Wu Zhe Xu May Xu Yi Zhang Zhizhou Zhang

## Contents – Part II

#### Logics, Decision Procedures, and Solvers


#### Verification



Communication-Closed Asynchronous Protocols . . . . . . . . . . . . . . . . . . . . . 344 Andrei Damian, Cezara Drăgoi, Alexandru Militaru, and Josef Widder

#### Verification and Invariants



## Contents – Part I

#### Automata and Timed Systems


#### Synthesis


#### Model Checking


#### Cyber-Physical Systems and Machine Learning


#### Dynamical, Hybrid, and Reactive Systems


Logics, Decision Procedures, and Solvers

## **Satisfiability Checking for Mission-Time LTL**

Jianwen Li1(B) , Moshe Y. Vardi2, and Kristin Y. Rozier1(B)

<sup>1</sup> Iowa State University, Ames, IA, USA lijwen2748@gmail.com, kyrozier@iastate.edu <sup>2</sup> Rice University, Houston, TX, USA

**Abstract.** Mission-time LTL (MLTL) is a bounded variant of MTL over naturals designed to generically specify requirements for mission-based system operation common to aircraft, spacecraft, vehicles, and robots. Despite the utility of MLTL as a specification logic, major gaps remain in analyzing MLTL, e.g., for specification debugging or model checking, centering on the absence of any complete MLTL satisfiability checker. We prove that the MLTL satisfiability checking problem is NEXPTIME-complete and that satisfiability checking MLTL0, the variant of MLTL where all intervals start at 0, is PSPACE-complete. We introduce translations for MLTL-to-LTL, MLTL-to-LTL*<sup>f</sup>* , MLTL-to-SMV, and MLTL-to-SMT, creating four options for MLTL satisfiability checking. Our extensive experimental evaluation shows that the MLTL-to-SMT transition with the Z3 SMT solver offers the most scalable performance.

#### **1 Introduction**

Mission-time LTL (MLTL) [34] has the syntax of Linear Temporal Logic with the option of integer bounds on the temporal operators. It was created as a generalization of the variations [3,14,25] on finitely-bounded linear temporal logic, ideal for specification of missions carried out by aircraft, spacecraft, rovers, and other vehicular or robotic systems. MLTL provides the readability of LTL [32], while assuming, when a different duration is not specified, that all requirements must be upheld during the (a priori known) length of a given mission, such as during the half-hour battery life of an Unmanned Aerial System (UAS). Using integer bounds instead of real-number or real-time bounds leads to more generic specifications that are adaptable to model checking at different levels of abstraction, or runtime monitoring on different platforms (e.g., in software vs in hardware). Integer bounds should be read as generic time units, referring to the basic temporal resolution of the system, which can generically be resolved to units such as clock ticks or seconds depending on the mission. Integer bounds also allow generic specification with respect to different granularities of time, e.g., to allow easy updates to model-checking models, and re-usable specifications for the same requirements on different embedded systems that may have different resource limits for storing runtime monitors. MLTL has been used in many industrial case studies [18,28,34,37,42–44], and was the official logic of the 2018 Runtime Verification Benchmark Competition [1]. Many specifications from other case studies, in logics such as MTL [3] and STL [25], can be represented in MLTL. We intuitively relate MLTL to LTL and MTL-over-naturals as follows: (1) MLTL formulas are LTL formulas with bounded intervals over temporal operators, and interpreted over finite traces. (2) MLTL formulas are MTL-over-naturals formulas without any unbounded intervals, and interpreted over finite traces.

Despite the practical utility of MLTL, no model checker currently accepts this logic as a specification language. The model checker nuXmv encodes a related logic for use in symbolic model checking, where the and ♦ operators of an LTLSPEC can have integer bounds [21], though bounds cannot be placed on the U or V (the Release operator of nuXmv) operators.

We also critically need an MLTL satisfiability checker to enable specification debugging. Specification is a major bottleneck to the formal verification of mission-based, especially autonomous, systems [35], with a key part of the problem being the availability of good tools for *specification debugging*. Satisfiability checking is an integral tool for specification debugging: [38,39] argued that for every requirement ϕ we need to check ϕ and ¬ϕ for satisfiability; we also need to check the conjunction of all requirements to ensure that they can all be true of the same system at the same time. Specification debugging is essential to model checking [39–41] because a positive answer may not mean there is no bug and a negative answer may not mean there is a bug if the specification is valid/unsatisfiable, respectively. Specification debugging is critical for synthesis and runtime verification (RV) since in these cases there is no model; synthesis and RV are both entirely dependent on the specification. For synthesis, satisfiability checking is the best-available specification-debugging technique, since other techniques, such as vacuity checking (cf. [6,10]) reference a model in addition to the specification. While there are artifacts one can use in RV, specification debugging is still limited outside of satisfiability checking yet central to correct analysis. A false positive due to RV of an incorrect specification can have disastrous consequences, such as triggering an abort of an (otherwise successful) mission to Mars. Arguably, the biggest challenge to creating an RV algorithm or tool is the dearth of benchmarks for checking correctness or comparatively analyzing these [36], where a benchmark consists of some runtime trace, a temporal logic formula reasoning about that trace, and some verdict designating whether the trace at a given time satisfies the requirement formula. A MLTL satisfiability solver is useful for RV benchmark generation [22].

Despite the critical need for an MLTL satisfiability solver, no such tool currently exists. To the best of our knowledge, there is only one available solver (*zot* [8]) for checking the satisfiability of MTL-over-naturals formulas, interpreted over infinite traces. Since MLTL formulas are interpreted over finite traces and there is no trivial reduction from one to another, *zot* cannot be directly applied to MLTL satisfiability checking.

Our approach is inspired by satisfiability-checking algorithms from other logics. For LTL satisfiability solving, we observe that there are multiple efficient translations from LTL satisfiability to model checking, using nuXmv [40]; we therefore consider here translations to nuXmv model checking, both indirectly (as a translation to LTL), and directly using the new KLIVE [13] back-end and the BMC back-end, taking advantage of the bounded nature of MLTL. The bounded nature of MLTL enables us to also consider a direct encoding at the word-level, suitable as input to an SMT solver. Our contribution is both theoretic and experimental. We first consider the complexity of such translations. We prove that the MLTL satisfiability checking problem is NEXPTIMEcomplete and that satisfiability checking MLTL0, the variant of MLTL where all intervals start at 0, is PSPACE-complete. Secondly, we introduce translation algorithms for MLTL-to-LTL<sup>f</sup> (LTL over finite traces [14]), MLTL-to-LTL, MLTL-to-SMV, and MLTL-to-SMT, thus creating four options for MLTL satisfiability checking. Our results show that the MLTL-to-SMT transition with the Z3 SMT solver offers the most scalable performance, though the MLTL-to-SMV translation with an SMV model checker can offer the best performance when the intervals in the MLTL formulas are restricted to small ranges less than 100.

#### **2 Preliminaries**

A (closed) interval over naturals I = [a, b] (0 ≤ a ≤ b are natural numbers) is a set of naturals {i | a ≤ i ≤ b}. I is called *bounded* iff b < +∞; otherwise I is *unbounded*. MLTL is defined using bounded intervals. Unlike Metric Temporal Logic (MTL) [4], it is not necessary to introduce open or half-open intervals over the natural domain, as every open or half-open bounded interval is reducible to an equivalent closed bounded interval, e.g., (1,2) = ∅, (1,3) = [2,2], (1,3] = [2,3], etc. Let AP be a set of atomic propositions, then the syntax of a formula in MLTL is

$$\varphi ::= \mathtt{true} \mid \mathtt{false} \mid p \mid \neg \varphi \mid \varphi \land \psi \mid \varphi \lor \psi \mid \Box \varphi \mid \lozenge \varphi \mid \varphi \mathcal{U}\_I \; \psi \mid \varphi \mathcal{R}\_I \psi$$

where I is a bounded interval, p ∈ AP is an *atom*, and ϕ and ψ are subformulas.

Given two MLTL formulas ϕ, ψ, we denote ϕ = ψ iff they are *syntactically equivalent*, and ϕ ≡ ψ iff they are *semantically equivalent*, i.e., π |= ϕ iff π |= ψ for a finite trace π. In MLTL semantics, we define false ≡ ¬true, ϕ ∨ ψ ≡ ¬(¬ϕ ∧ ¬ψ), ¬(ϕ U<sup>I</sup> ψ) ≡ (¬ϕRI¬ψ) and ¬♦Iϕ ≡ -<sup>I</sup>¬ϕ. MLTL keeps the standard operator equivalences from LTL, including (♦Iϕ) ≡ (true UIϕ), (-<sup>I</sup>ϕ) ≡ (f alse R<sup>I</sup> ϕ), and (ϕ R<sup>I</sup> ψ) ≡ (¬(¬ϕ U<sup>I</sup> ¬ψ)). Notably, MLTL discards the neXt (X ) operator, which is essential in LTL [32], since X ϕ is semantically equivalent to -[1,1]ϕ.

The semantics of MLTL formulas is interpreted over finite traces bounded by base-10 (decimal) intervals. Let π be a finite trace in which every position π[i] (i ≥ 0) is over 2AP , and |π| denotes the length of π (|π| < +∞ when π is a finite trace). We use π<sup>i</sup> (|π| > i ≥ 0) to represent the suffix of π starting from position i (including i). Let a, b <sup>∈</sup> <sup>I</sup>, a <sup>≤</sup> <sup>b</sup>; we define that <sup>π</sup> models (satisfies) an MLTL formula <sup>ϕ</sup>, denoted as π |= ϕ, as follows:


Compared to the traditional MTL-over-naturals<sup>1</sup> [16], the Until formula in MLTL is interpreted in a slightly different way. In MTL-over-naturals, the satisfaction of ϕ U<sup>I</sup> ψ requires ϕ to hold from position 0 to the position where ψ holds (in I), while in MLTL ϕ is only required to hold within the interval I, before the time ψ holds. From the perspective of writing specifications, cf. [34,37], this adjustment is more user-friendly.

<sup>1</sup> In this paper, MTL-over-naturals is interpreted over finite traces.

It is not hard to see that MLTL is as expressive as the standard MTL-over-naturals: the formula ϕ U[a,b] ψ in MTL-over-naturals can be represented as (-[0,a−1]ϕ) ∧ (ϕ U[a,b] ψ) in MLTL; ϕ U[a,b] ψ in MLTL can be represented as ♦[a,a](ϕ U[0,b−a] ψ) in MTL-over-naturals.

We say an MLTL formula is in *BNF* if the formula contains only ¬, ∧ and U<sup>I</sup> operators. It is trivial to see that every MLTL formula can be converted to its (semantically) equivalent BNF with a linear cost. Consider ϕ = (¬a) ∨ ((¬b)R<sup>I</sup> (¬c)) as an example. Its BNF form is ¬(a ∧ (b U<sup>I</sup> c)). Without explicit clarification, this paper assumes that every MLTL formula is in BNF.

The closure of an MLTL formula ϕ, denoted as cl(ϕ), is a set of formulas such that: (1) ϕ ∈ cl(ϕ); (2) ϕ ∈ cl(ϕ) if ¬ϕ ∈ cl(ϕ); (3) ϕ, ψ ∈ cl(ϕ) if ϕ op ψ ∈ cl(ϕ), where op can be ∧ or U<sup>I</sup> . Let |cl(ϕ)| be the size of cl(ϕ). Since the definition of cl(ϕ) ignores the intervals in ϕ, |cl(ϕ)| is linear in the number of operators in ϕ. We also define the closure(\*) of an MLTL formula ϕ, denoted cl∗(ϕ), as the set of formulas such that: (1) cl(ϕ) ⊆ cl∗(ϕ); (2) if ϕ U[a,b] ψ ∈ cl∗(ϕ) for 0 < a ≤ b, then ϕ U[a−1,b−1] ψ is in cl∗(ϕ); (3) if ϕ U[0,b] ψ ∈ cl∗(ϕ) for 0 < b, then ϕ U[0,b−1] ψ is in cl∗(ϕ). Let |cl∗(ϕ)| be the size of cl∗(ϕ) and K be the maximal natural number in the intervals of ϕ. It is not hard to see that |cl∗(ϕ)| is at most K · |cl(ϕ)|.

We also consider a fragment of MLTL, namely MLTL0, which is more frequently used in practice, cf. [18,34]. Informally speaking, MLTL<sup>0</sup> formulas are MLTL formulas in which all intervals start from 0. For example, ♦[0,4]a∧(a U[0,1] b) is a MLTL<sup>0</sup> formula, while ♦[2,4]a is not.

Given an MLTL formula ϕ, the *satisfiability problem* asks whether there is a finite trace π such that π |= ϕ holds. To solve this problem, we can reduce it to the satisfiability problem of the related logics LTL and LTL<sup>f</sup> (LTL over finite traces [14]), and leverage the off-the-shelf satisfiability checking solvers for these well-explored logics. We abbreviate MLTL, LTL, and LTL<sup>f</sup> satisfiability checking as MLTL-SAT, LTL-SAT, and LTL<sup>f</sup> -SAT respectively.

LTL<sup>f</sup> **: Linear Temporal Logic over Finite Traces** [14]**.** We assume readers are familiar with LTL (over infinite traces). LTL<sup>f</sup> is a variant of LTL that has the same syntax, except that for LTL<sup>f</sup> , the dual operator of X is N (weak Next), which differs X in the last state of the finite trace. In the last state of a finite trace, X ψ can never be satisfied, while N ψ is satisfiable. Given an LTL<sup>f</sup> formula ϕ, there is an LTL formula ψ such that ϕ is satisfiable iff ψ is satisfiable. In detail, ψ = ♦T ail ∧ t(ϕ) where T ail is a new atom identifying the end of the satisfying trace and t(ϕ) is constructed as follows:

– t(p) = p where p is an atom; – t(¬ψ) = ¬t(ψ); – t(X ψ) = ¬T ail ∧ X t(ψ); – t(ψ<sup>1</sup> ∧ ψ2) = t(ψ1) ∧ t(ψ2); – t(ψ1Uψ2) = t(¬T ail ∧ ψ1)U t(ψ2).

In the above reduction, ϕ is in BNF. Since the reduction is linear in the size of the original LTL<sup>f</sup> formula and LTL-SAT is PSPACE-complete [45], LTL<sup>f</sup> -SAT is also a PSPACE-complete problem [14].

#### **3 Complexity of** MLTL-SAT

It is known that the complexity of MITL (Metric Interval Temporal Logic) satisfiability is EXPSPACE-complete, and the satisfiability complexity of the fragment of MITL named MITL0,<sup>∞</sup> is PSPACE-complete [2]. MLTL (resp. MLTL0) can be viewed as a variant of MITL (resp. MITL0,∞) that is interpreted over the naturals. We show that MLTL satisfiability checking is NEXPTIME-complete, via a reduction from MLTL to LTL<sup>f</sup> .

**Lemma 1.** *Let* ϕ *be an* MLTL *formula, and* K *be the maximal natural appearing in the intervals of* ϕ *(*K *is set to 1 if there are no intervals in* ϕ*). There is an* LTL<sup>f</sup> *formula* θ *that recognizes the same language as* ϕ*. Moreover, the size of* θ *is in* O(K · |cl(ϕ)|)*.*

*Proof* (Sketch). For an MLTL formula ϕ, we define the LTL<sup>f</sup> formula f(ϕ) recursively as follows:


$$f(\varphi) = \begin{cases} \mathcal{X}(f(\xi \,\mathcal{U}\_{[a-1,b-1]} \,\psi)), & \text{if } 0 < a \le b; \\ f(\psi) \vee (f(\xi) \wedge \mathcal{X}(f(\xi U\_{[a,b-1]} \psi))), & \text{if } a = 0 \text{ and } 0 < b; \\ f(\psi), & \text{if } a = 0 \text{ and } b = 0; \end{cases}$$

X represents the neXt operator in LTL<sup>f</sup> . Let θ = f(ϕ); we can prove by induction that ϕ and θ accept the same language. Moreover, the size of θ is at most linear to K · |cl(ϕ)|, i.e., in O(K · |cl(ϕ)|), based on the aforementioned construction.

We use the construction shown in Lemma 1 to explore several useful properties of MLTL. For instance, the LTL<sup>f</sup> formula translated from an MLTL formula contains only the X temporal operator or its dual N , which represents weak Next [19,23], and the number of these operators is strictly smaller than K ·|cl(ϕ)|. Every X or N subformula in the LTL<sup>f</sup> formula corresponds to some temporal formula in cl∗(ϕ). Notably, because the natural-number intervals in ϕ are written in base 10 (decimal) notation, the blow-up in the translation of Lemma 1 is exponential.

The next lower bound is reminiscent of the NEXPTIME-lower bound shown in [31] for a fragment of Metric Interval Temporal Logic (MITL), but is different in the details of the proof as the two logics are quite different.

#### **Theorem 1.** *The complexity of* MLTL *satisfiability checking is NEXPTIME-complete.*

*Proof* (Sketch). By Lemma 1, there is an LTL<sup>f</sup> formula θ that accepts the same traces as MLTL formula ϕ, and the size of θ is in O(K · |cl(ϕ)|). The only temporal connectives used in θ are X and N , since the translation to LTL<sup>f</sup> reduces all MLTL temporal connectives in ϕ to nested X 's or N 's (produced by simplifying ¬X ). Thus, if θ is satisfiable, then it is satisfiable by a trace whose length is bounded by the length of θ. Thus, we can just guess a trace π of exponential length of θ and check that it satisfies ϕ. As a result, the upper bound for MLTL-SAT is NEXPTIME.

Before proving the NEXPTIME lower bound, recall the PSPACE-lower bound proof in [45] for LTL satisfiability. The proof reduces the acceptance problem for a linear-space bounded Turing machine M to LTL satisfiability. Given a Turing machine M and an integer k, we construct a formula ϕ<sup>M</sup> such that ϕ<sup>M</sup> is satisfiable iff M accepts the empty tape using k tape cells. The argument is that we can encode such a space-bounded computation of M by a trace π of length c<sup>k</sup> for some constant c, and then use ϕ<sup>M</sup> to force π to encode an accepting computation of M. The formula ϕ<sup>M</sup> has to match corresponding points in successive configurations of M, which can be expressed using a O(k)-nested X 's, since such points are O(k) points apart.

To prove a NEXPTIME-lower bound for MLTL, we reduce the acceptance problem for exponentially bounded non-deterministic Turing machines to MLTL satisfiability. Given a non-deterministic Turing machine M and an integer k, we construct an MLTL formula ϕ<sup>M</sup> of length O(k) such that ϕ<sup>M</sup> is satisfiable iff M accepts the empty tape in time 2<sup>k</sup>. Note that such a computation of a 2<sup>k</sup>-time bounded Turing machines consists of 2<sup>k</sup> many configurations of length 2<sup>k</sup> each, so the whole computation is of exponential length – 4<sup>k</sup>, and can be encoded by a trace π of length 4<sup>k</sup>, where every point of π encodes one cell in the computation of M. Unlike the reduction in [45], in the encoding here corresponding points in successive configurations are exponentially far (2<sup>k</sup>) from each other, because each configuration has 2<sup>k</sup> cells, so the relationship between such successive points cannot be expressed in LTL. Because, however, the constants in the intervals of MLTL are written in base-10 (decimal) notation, we can write formulas of size <sup>O</sup>(k), e.g., formulas of the form <sup>p</sup> <sup>U</sup>[0,2k] <sup>q</sup>, that relate points that are <sup>2</sup><sup>k</sup> apart.

The key is to express the fact that one Turing machine configuration is a proper successor of another configuration using a formula of size O(k). In the PSPACE-lowerbound proof of [45], LTL formulas of size O(k) relate successive configurations of k-space-bounded machines. Here MLTL formulas of size O(k) relate successive configurations of 2<sup>k</sup>-time-bounded machines. Thus, we can write a formula ϕ<sup>M</sup> of length O(k) that forces trace π to encode a computation of M of length 2<sup>k</sup>.

Now we consider MLTL<sup>0</sup> formulas, and prove that the complexity of checking the satisfiability of MLTL<sup>0</sup> formulas is PSPACE-complete. We first introduce the following lemma to show an inherent feature of MLTL<sup>0</sup> formulas.

**Lemma 2.** *The conjunction of identical* MLTL<sup>0</sup> U*-rooted formulas is equivalent to the conjunct with the smallest interval range:* (ξ U[0,a] ψ) ∧ (ξ U[0,b] ψ) ≡ (ξ U[0,a] ψ)*, where* b>a*.*

*Proof.* We first prove that for i ≥ 0, the equation (ξ U[0,i] ψ) ∧ (ξ U[0,i+1] ψ) ≡ (ξ U[0,i] ψ) holds. When i = 0, we have (ξ U[0,0] ψ) ≡ f(ψ) and (ξ U[0,1] ψ) ≡ (f(ψ) ∨ f(ξ) ∧ X (f(ψ))). So (ξ U[0,0] ψ) ∧ (ξ U[0,1] ψ) ≡ f(ψ) ≡ (ξ U[0,0] ψ) is true. Inductively, assume that (ξ U[0,k] ψ) ∧ (ξ U[0,k+1] ψ) ≡ (ξ U[0,k] ψ) is true for k ≥ 0. When i = k + 1, we have (ξ U[0,k+1] ψ) ≡ (f(ψ) ∨ f(ξ) ∧ X (ξ U[0,k] ψ)) and (ξ U[0,k+2] ψ) ≡ (f(ψ) ∨ f(ξ) ∧ X (ξ U[0,k+1] ψ)). By hypothesis assumption, (ξ U[0,k] ψ) ∧ (ξ U[0,k+1] ψ) ≡ (ξ U[0,k] ψ) implies that the following equivalence is true:

$$\begin{array}{lcl} & (\xi \,\mathcal{U}\_{[0,k+1]} \,\psi) \wedge (\xi \,\mathcal{U}\_{[0,k+2]} \,\psi) \\ & \equiv & (f(\psi) \vee (f(\xi) \wedge \mathcal{X}(\xi \,\mathcal{U}\_{[0,k]} \,\psi))) \wedge (f(\psi) \vee (f(\xi) \wedge \mathcal{X}(\xi \,\mathcal{U}\_{[0,k+1]} \,\psi))) \\ & \equiv & f(\psi) \vee (f(\xi) \wedge \mathcal{X}(\xi \,\mathcal{U}\_{[0,k]} \,\psi \wedge \xi \,\mathcal{U}\_{[0,k+1]} \,\psi)) \\ & \equiv & f(\psi) \vee (f(\xi) \wedge \mathcal{X}(\xi \,\mathcal{U}\_{[0,k]} \,\psi)) \\ & \equiv & (\xi \,\mathcal{U}\_{[0,k+1]} \,\psi) . \end{array}$$

Since (ξ U[0,i] ψ) ∧ (ξ U[0,i+1] ψ) ≡ (ξ U[0,i] ψ) is true, we can prove by induction that (ξ U[0,i] ψ) ∧ (ξ U[0,j] ψ) ≡ (ξ U[0,i] ψ) is true, where j>i. Because b>a is true, it directly implies that (ξ U[0,a] ψ) ∧ (ξ U[0,b] ψ) ≡ (ξ U[0,a] ψ) is true.

**Lemma 3.** X *-free* LTL<sup>f</sup> -SAT *is reducible to* MLTL0-SAT *at a linear cost.*

*Proof.* According to [45], the satisfiability checking of X -free LTL formulas is still PSPACE-complete. This also applies to the satisfiability checking of X -free LTL<sup>f</sup> formulas. Given an X -free LTL<sup>f</sup> formula ϕ, we construct the corresponding MLTL formula m(ϕ) recursively as follows:

– m(p) = p where p is an atom;

$$-\operatorname\*{m}(\neg \xi) = \neg \operatorname{m}(\xi);$$

$$-\dim(\xi \wedge \psi) = m(\xi) \wedge m(\psi);$$

– m(ξ U ψ) = m(ξ) U[0,2*|*ϕ*|*] m(ψ).

Notably for the Until LTL<sup>f</sup> formula, we bound it with the interval [0, 2|ϕ<sup>|</sup> ], where ϕ is the original X -free LTL<sup>f</sup> formula, in the corresponding MLTL formula, which is motivated by the fact that every satisfiable LTL<sup>f</sup> formula has a finite model whose length is less than 2|ϕ<sup>|</sup> [14]. The above translation has linear blow-up, because the integers in intervals use the decimal notation. Now we prove by induction over the type of ϕ that ϕ is satisfiable iff m(ϕ) is satisfiable. That is, we prove that (⇒) π |= ϕ implies π |= m(ϕ) and (⇐) π |= m(ϕ) implies π |= ϕ, for some finite trace π.

We consider the Until formula η = ξ U ψ (noting that ϕ is fixed to the original LTL<sup>f</sup> formula), and the proofs are trivial for other types. (⇒) η is satisfiable implies there is a finite trace <sup>π</sup> such that <sup>π</sup> <sup>|</sup><sup>=</sup> <sup>η</sup> and <sup>|</sup>π| ≤ <sup>2</sup>|ϕ<sup>|</sup> [14]. Moreover, <sup>π</sup> <sup>|</sup><sup>=</sup> <sup>η</sup> holds iff there is 0 ≤ i such that π<sup>i</sup> |= ψ and for every 0 ≤ j<i, π<sup>j</sup> |= ξ is true (from LTL<sup>f</sup> semantics). By the induction hypothesis, π<sup>i</sup> |= ψ implies π<sup>i</sup> |= m(ψ) and π<sup>j</sup> |= ξ implies <sup>π</sup><sup>j</sup> <sup>|</sup><sup>=</sup> <sup>m</sup>(ξ). Also, <sup>i</sup> <sup>≤</sup> <sup>2</sup>|ϕ<sup>|</sup> is true because of <sup>|</sup>π| ≤ <sup>2</sup>|ϕ<sup>|</sup> . As a result, π |= η implies that there is <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>2</sup>|ϕ<sup>|</sup> such that <sup>π</sup><sup>i</sup> <sup>|</sup><sup>=</sup> <sup>m</sup>(ψ) and for every <sup>0</sup> <sup>≤</sup> j<i, π<sup>j</sup> |= m(ξ) is true. According to the MLTL semantics, π |= m(η) is true. (⇐) m(η) is satisfiable implies there is a finite trace π such that π |= m(η). According to MLTL semantics, there is <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>2</sup>|ϕ<sup>|</sup> such that <sup>π</sup><sup>i</sup> <sup>|</sup><sup>=</sup> <sup>m</sup>(ψ) and for every <sup>0</sup> <sup>≤</sup> j<i it holds that π<sup>j</sup> |= m(ξ). By hypothesis assumption, π<sup>i</sup> |= m(ψ) implies π<sup>i</sup> |= ψ and <sup>π</sup><sup>j</sup> <sup>|</sup><sup>=</sup> <sup>m</sup>(ξ) implies <sup>π</sup><sup>j</sup> <sup>|</sup><sup>=</sup> <sup>ξ</sup>. Also, <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>2</sup>|ϕ<sup>|</sup> implies <sup>0</sup> <sup>≤</sup> <sup>i</sup>. As a result, <sup>π</sup> <sup>|</sup><sup>=</sup> <sup>m</sup>(η) implies that there is 0 ≤ i such that π<sup>i</sup> |= ψ and for every 0 ≤ j<i it holds that π<sup>j</sup> |= ξ. From LTL<sup>f</sup> semantics, it is true that π |= η.

**Theorem 2.** *The complexity of checking the satisfiability of* MLTL<sup>0</sup> *is PSPACEcomplete.*

*Proof.* Since Lemma 3 shows a linear reduction from X -free LTL<sup>f</sup> -SAT to MLTL0- SAT and X -free LTL<sup>f</sup> -SAT is PSPACE-complete [14], it directly implies that the lower bound of MLTL0-SAT is PSPACE-hard.

For the upper bound, recall from the proof of Theorem 1 that an MLTL formula ϕ is translated to an LTL<sup>f</sup> formula θ of length K·|cl(ϕ)|, which, as we commented, involved an exponential blow-up in the notation for K. Following the automata-theoretic approach for satisfiability, one would translate θ to an NFA and check its non-emptiness [14]. Normally, such a translation would involve another exponential blow-up. We show that this is not the case for MLTL0. Recalling from the automaton construction in [14] that every state of the automaton is a set of subformulas of θ, the size of a state is at most K · |cl(ϕ)|. In the general case, if ψ1, ψ<sup>2</sup> are two subformulas of θ corresponding to the MLTL formulas ξ U<sup>I</sup><sup>1</sup> ψ and ξ U<sup>I</sup><sup>2</sup> ψ, ψ<sup>1</sup> and ψ<sup>2</sup> can be in the same state of the automaton, which implies that the size of the state can be at most K · |cl(ϕ)|. When the formula ϕ is restricted to MLTL0, we show that the exponential blow-up can be avoided. Lemma 2 shows that either ψ<sup>1</sup> or ψ<sup>2</sup> in the state is enough, since assuming I<sup>1</sup> ⊆ I2, then (ψ<sup>1</sup> ∧ ψ2) ≡ ψ1, by Lemma 2. So the size of the state in the automaton for a MLTL<sup>0</sup> formula ϕ is at most |cl(ϕ)|. For each subformula in the state, there can be K possible values (e.g., for ♦<sup>I</sup> ξ in the state, we can have ♦[0,1]ξ, ♦[0,2]ξ, etc.). Therefore the size of the automaton is in <sup>O</sup>(2|cl(ϕ)<sup>|</sup> · <sup>K</sup>|cl(ϕ)<sup>|</sup> ) <sup>≈</sup> <sup>2</sup><sup>O</sup>(|cl(ϕ)|). Therefore, MLTL<sup>0</sup> satisfiability checking is a PSPACE-complete problem.

#### **4 Implementation of** MLTL-SAT

We first show how to reduce MLTL-SAT to the well-explored LTL<sup>f</sup> -SAT and LTL-SAT. Then we introduce two new satisfiability-checking strategies based on the inherent properties of MLTL formulas, which are able to leverage the state-of-art model-checking and SMT-solving techniques.

#### **4.1** MLTL-SAT **via Logic Translation**

For a formula ϕ from one logic, and ψ from another logic, we say ϕ and ψ are *equisatisfiable* when ϕ is satisfiable under its semantics iff ψ is satisfiable under its semantics. Based on Lemma 1 and Theorem 1, we have the following corollary,

**Corollary 1 (**MLTL-SAT **to** LTL<sup>f</sup> -SAT**).** MLTL-SAT *can be reduced to* LTL<sup>f</sup> -SAT *with an exponential blow-up.*

From Corollary 1, MLTL-SAT is reducible to LTL<sup>f</sup> -SAT, enabling use of the offthe-shelf LTL<sup>f</sup> satisfiability solvers, cf. aaltaf [23]. It is also straightforward to consider MLTL-SAT via LTL-SAT; LTL-SAT has been studied for more than a decade, and there many off-the-shelf LTL solvers are available, cf. [24,38,40].

**Theorem 3 (**MLTL **to** LTL**).** *For an* MLTL *formula* ϕ*, there is an* LTL *formula* θ *such that* ϕ *and* θ *are equi-satisfiable, and the size of* θ *is in* O(K · |cl(ϕ)|)*, where* K *is the maximal integer in* ϕ*.*

*Proof.* Lemma 1 provides a translation from the MLTL formula ϕ to the equivalent LTL<sup>f</sup> formula ϕ , with a blow-up of O(K · |cl(ϕ)|). As shown in Sect. 2, there is a linear translation from the LTL<sup>f</sup> formula ϕ to its equi-satisfiable LTL formula θ [14]. Therefore, the blow-up from ϕ to θ is in O(K · |cl(ϕ)|).

**Corollary 2 (**MLTL-SAT **to** LTL-SAT**).** MLTL-SAT *can be reduced to* LTL-SAT *with an exponential blow-up.*

Since MLTL-SAT is reducible to LTL-SAT, MLTL-SAT can also benefit from the power of LTL satisfiability solvers. Moreover, the reduction from MLTL-SAT to LTL-SAT enables leveraging modern model-checking techniques to solve the MLTL-SAT problem, due to the fact that LTL-SAT has been shown to be reducible to model checking with a linear blow-up [38,39].

**Corollary 3 (**MLTL-SAT **to** LTL*-***Model***-***checking**). MLTL-SAT *can be reduced to* LTL *model checking with an exponential blow-up.*

In our implementation, we choose the model checker nuXmv [12] for LTL satisfiability checking, as it allows an LTL formula to be directly input as the temporal specification together with a universal model as described in [38,39].

#### **4.2 Model Generation**

Using the LTL formula as the temporal specification in nuXmv has been shown, however, to not be the most efficient way to use model checking for satisfiability checking [40]. Consider the MLTL formula ♦[0,10]a ∧ ♦[1,11]a. The translated LTL<sup>f</sup> formula is f(♦[0,10]a) ∧ X (f(♦[0,10]a)), where f(♦[0,10]a) has to be constructed twice. To avoid such redundant construction, we follow [40] and encode directly the input MLTL formula as an SMV model (the input model of nuXmv) rather than treating the LTL formula, which is obtained from the input MLTL formula, as a specification.

An SMV [27] model consists of a Boolean transition system Sys = (V, I,T), where V is a set of Boolean variables, I is a Boolean formula representing the initial states of Sys, and T is the Boolean transition formula. Moreover, a specification to be verified against the system is also contained in the SMV model (here we focus on the LTL specification). Given the input MLTL formula ϕ, we construct the corresponding SMV model M<sup>ϕ</sup> as follows.


<sup>2</sup> A temporary variable is introduced in the DEFINE statement rather than the VAR statement of the SMV model, as it will be automatically replaced with those in VAR statements.

	- 1. e(ψ) = ψ, if ψ is an Boolean atom;

$$\text{2. } e(\psi) = \neg e(\psi\_1), \text{ if } \psi = \neg \psi\_1;$$


$$T\_{\psi\_1 \mathcal{U}\_{[a,b]} \psi\_2} = \begin{cases} \mathcal{X}\_\ast(\psi\_1 \mathcal{U}\_{[a-1,b-1]} \psi\_2), & \text{if } 0 < a \le b; \\ e(\psi\_2) \vee (e(\psi\_1) \wedge \mathcal{X}\_\ast(\psi\_1 \mathcal{U}\_{[0,b-1]} \psi\_2)), & \text{if } a = 0 \text{ and } 0 < b; \\ e(\psi\_2), & \text{if } a = 0 \text{ and } b = 0. \end{cases}$$


**Encoding Heuristics for** MLTL<sup>0</sup> **Formulas.** We also encode the rules shown in Lemma 2 to prune the state space for checking the satisfiability of MLTL<sup>0</sup> formulas. These rules are encoded using the INVAR constraint in the SMV model. Taking the U formula as an example, we encode T (ψ1U[0,a]ψ2) ∧ T (ψ1U[0,a−1]ψ2) ↔ T (ψ1U[0,a−1]ψ2) (a > 0) for each ψ1U[0,a]ψ<sup>2</sup> in cl∗(ϕ). Similar encodings also apply to the R formulas in cl∗(ϕ). Theorem 4 below guarantees the correctness of the translation, and it can be proved by induction over the type of ϕ and the construction of the SMV model.

**Theorem 4.** *The* MLTL *formula* ϕ *is satisfiable iff the corresponding* SMV *model* M<sup>ϕ</sup> *violates the* LTL *property* -¬T ail*.*

There are different techniques that can be used for LTL model checking. Based on the latest evaluation of LTL satisfiability checking [24], the KLIVE [13] back-end implemented in the SMV model checker nuXmv [12] produces the best performance. We thus choose KLIVE as our model-checking technique for MLTL-SAT.

**Bounded** MLTL-SAT**.** Although MLTL-SAT is reducible to the satisfiability problem of other well-explored logics, with established off-the-shelf satisfiability solvers, a dedicated solution based on inherent properties of MLTL may be superior. One intuition is, since all intervals in MLTL formulas are bounded, the satisfiability of the formula can be reduced to Bounded Model Checking (BMC) [9].

**Theorem 5.** *Given an* MLTL *formula* ϕ *with* K *as the largest natural in the intervals of* ϕ*,* ϕ *is satisfiable iff there is a finite trace* π *with* |π| ≤ K · |cl(ϕ)| *such that* π |= ϕ*.*

Theorem 5 states that the satisfiability of a given MLTL formula can be reduced to checking for the existence of a satisfying trace. To apply the BMC technique in nuXmv, we compute and set the maximal depth of BMC to be the value of K ·|cl(ϕ)| for a given MLTL formula ϕ. The input SMV model for BMC is still Mϕ, as described in Sect. 4.2. **However to ensure correct** BMC **checking in** nuXmv**, the constraint "FAIRNESS TRUE" has to be added into the** SMV **model.**<sup>3</sup> The LTLSPEC remains -¬T ail. According to Theorem 5, ϕ is satisfiable iff the model checker returns a counterexample by using the BMC technique within the maximal depth of K · |cl(ϕ)|.

#### **4.3** MLTL-SAT **via** SMT **Solving**

Another approach to solve MLTL-SAT is via SMT solving, considering that using SMT solvers to handle intervals in MLTL formulas is straightforward. Since the input logic of SMT solvers is First-Order Logic, we must first translate the MLTL formula to its equisatisfiable formula in First-Order Logic over the natural domain N. We assume that readers are familiar with First-Order Logic and only focus on the translation. Given an MLTL formula ϕ and the alphabet Σ, we construct the corresponding formula in First-Order Logic over N in the following way.

	- fol(true, k, len)=(len > k) and fol(false, k, len) = f alse;
	- fol(p, k, len)=(len > k) ∧ fp(k) for p ∈ Σ;
	- fol(¬ξ, k, len)=(len > k) ∧ ¬fol(ξ, k, len);
	- fol(ξ ∧ ψ, k, len)=(len > k) ∧ fol(ξ, k, len) ∧ fol(ψ, k, len);
	- fol(ξ U[a,b] ψ, k, len)=(len > a+k)∧∃i.( (a+k ≤ i ≤ b+k)∧ fol(ψ, i, len− i)∧ ∀j.( (a + k ≤ j<i) → fol(ξ, j, len − j)));

In the formula fol(ϕ, k, len), k represents the index of the (finite) trace from which ϕ is evaluated, and len indicates the length of the suffix of the trace starting from the index k. Since the formula is constructed recursively, we need to introduce k to record the index. Meanwhile, len is necessary because the MLTL semantics, which is interpreted over finite traces, constrains the lengths of the satisfying traces of the Until formulas. The following theorem guarantees that MLTL-SAT is reducible to the satisfiability of First-Order Logic.

**Theorem 6.** *For an* MLTL *formula* ϕ*,* ϕ *is satisfiable iff the corresponding First-Order Logic formula* ∃len.*fol*(ϕ, 0, len) *is satisfiable.*

*Proof.* Let the alphabet of <sup>ϕ</sup> be <sup>Σ</sup>, and <sup>π</sup> <sup>∈</sup> (2<sup>Σ</sup>)<sup>∗</sup> be a finite trace. For each <sup>p</sup> <sup>∈</sup> <sup>Σ</sup>, we define the function f<sup>p</sup> : Int → Bool as follows: fp(k) = true iff p ∈ π[k] if 0 ≤ k < |π|. We now prove by induction over the type of ϕ and the construction of fol(ϕ, k, len) with respect to ϕ that π<sup>k</sup> |= ϕ holds iff {fp|p ∈ Σ} is a model of fol(ϕ, k, |π|): here |π| is the length of π. The cases when ϕ is true or false are trivial.

– If ϕ = p is an atom, π<sup>k</sup> |= ϕ holds iff p ∈ π[k] (i.e., πk[0]) is true, which means fp(k) = true. As a result, {fp} is a model of fol(ϕ, k, |π|), which implies that π<sup>k</sup> |= ϕ holds iff {fp|p ∈ Σ} is a model of fol(ϕ, k, |π|).

<sup>3</sup> Based on comments in emails from the nuXmv developers.


This proof holds for all values of k, including the special case where k = 0.

We then encode ∃len.fol(ϕ, 0, len) into the SMT-LIB v2 format [7], which is the input of most modern SMT solvers; we call the full SMT-LIB v2 encoding SMT(ϕ). We first use the "declare-fun" command to declare a function f<sup>a</sup> : Int → Bool for each p ∈ Σ. We also define the function f<sup>ϕ</sup> : Int × Int → Bool for the First-Order Logic formula fol(ϕ, k, len). The corresponding SMT-LIB v2 command is "define-fun f<sup>ϕ</sup> ((k Int) (len Int)) Bool S(fol(ϕ, k, len))", where S(fol(ϕ, k, len)) is the SMT-LIB v2 implementation of fol(ϕ, k, len). In detail, S(fol(ϕ, k, len)) is acquired recursively as follows.


Finally, we use the "assert" command "(assert (exists ((len Int)) (f<sup>ϕ</sup> 0 len)))" together with the "(check-sat)" command to request SMT solvers for the satisfiability of ∃len.fol(ϕ, 0, len). In a nutshell, the general framework of the SMT-LIB v2 format for SMT(ϕ) (i.e., ∃len.fol(ϕ, 0, len)) is shown in Table 1, and the correctness is guaranteed by Theorem 7 below.

**Table 1.** The SMT-LIB v2 template for SMT(ϕ).

(declare-fun f<sup>a</sup> (Int) Bool) //declare corresponding function for a ∈ Σ ... //define function for (ϕ, k, len) (define-fun f<sup>ϕ</sup> ((k Int) (len Int)) Bool S( (ϕ, k, len))) (assert (exists ((len Int)) (f<sup>ϕ</sup> 0))) (check-sat)

**Theorem 7.** *The First-Order Logic formula* ∃len.fol(ϕ, 0, len) *is satisfiable iff the* SMT *solver returns SAT with the input* SMT(ϕ)*.*

An inductive proof for the theorem can be conducted according to the construction of SMT(ϕ). Notably, there is no difference between the SMT encoding for MLTL formulas and that for MLTL<sup>0</sup> formulas, as the SMT-based encoding does not require unrolling the temporal operators in the formula.

#### **5 Experimental Evaluations**

**Tools and Platform.** We implemented the translator MLTLconverter in C++, including encodings for an MLTL formula as equi-satisfiable LTL and LTL<sup>f</sup> formulas, and corresponding SMV and SMT-LIB v2 models. We leverage the extant LTL solver aalta [24], LTL<sup>f</sup> solver aaltaf [23], SMV model checker nuXmv [12], and the SMT solver Z3 [29] to check the satisfiability of the input MLTL formula in their respective encodings from MLTLconverter. The solvers, including the runtime flags we used, are summarized in Table 2. We evaluated both BMC and KLIVE [13] model-checking back-ends in nuXmv, and the corresponding commands are shown in Fig. 1. Notably in the figure, the maximal length "*MAX*" to run BMC is computed dynamically for each MLTL formula, based on Theorem 5.


**Table 2.** List of solvers and their runtime flags.

**Fig. 1.** nuXmv commands for BMC (left) and KLIVE (right).

All experiments were executed on Rice University's NOTS cluster,<sup>4</sup> running Red-Hat 5, with 226 dual socket compute blades housed within HPE s6500, HPE Apollo 2000, and Dell PowerEdge C6400 chassis. All the nodes are interconnected with 10 GigE network. Each satisfiability check over one MLTL formula and one solver was executed with exclusive access to one CPU and 8 GB RAM with a timeout of one hour, as measured by the Linux time command. We assigned a time penalty of one hour to benchmarks that segmentation fault or timeout.

**Experimental Goals.** We evaluate performance along three metrics. (1) Each satisfiability check has two parts: the encoding time (consumed by MLTLconverter) and the solving time (consumed by solvers). We evaluate how each encoding affects the performance of both stages of MLTL-SAT. (2) We comparatively analyze the performance and scalability of end-to-end MLTL-SAT via LTL-SAT, LTL<sup>f</sup> -SAT, LTL model checking, and our new SMT-based approach. (3) We evaluate the performance and scalability for MLTL<sup>0</sup> satisfiability checking using MLTL0-SAT encoding heuristics (Lemma 2).

**Benchmarks.** There are few MLTL (or even MTL-over-naturals) benchmarks available for evaluation. Previous works on MTL-over-naturals [2–4] mainly focus on the theoretic exploration of the logic. To enable rigorous experimental evaluation, we develop three types of benchmarks, motivated by the generation of LTL benchmarks [38].<sup>5</sup>

(1) *Random* MLTL *Formulas (*R*)*: We generated 10,000 R formulas, varying the formula length L (20, 40, 60, 80, 100), the number of variables N (1, 2, 3, 4, 5), and the probability of the appearance of the U operator P (0.33, 0,5, 0.7, 0.95); for each (L, N, P) we generated 100 formulas. For every U operator, we randomly chose an interval [i, j] where i ≥ 0 and j ≤ 100.

**Fig. 3.** Cactus plot for different MLTL solving approaches on R formulas: LTL-SAT and LTL*<sup>f</sup>* -SAT lines overlap.

<sup>4</sup> https://docs.rice.edu/confluence/display/CD/NOTS+Overview.

<sup>5</sup> All experimental materials are at http://temporallogic.org/research/CAV19/. The plots are best viewed online.

(2) *NASA-Boeing* MLTL *Formulas (*NB*)*: We use challenging benchmarks [15] created from projects at NASA [17,26] and Boeing [11]. We extract 63 real-life LTL requirements from the SMV models of the benchmarks, and then randomly generate an interval for each temporal operator. (We replace each X with -[1,1].) We create 3 groups of such formulas (63 in each) to test the scalability of different approaches, by restricting the maximal number of the intervals to be 1,000, 10,000, and 100,000 respectively.

(3) *Random* MLTL<sup>0</sup> *Formulas (*R0*)*: We generated 500 R0 formulas in the same way as the R formulas, except that every generated interval was restricted to start from 0; we generated sets of five for each (L, N, P). This small set of R benchmarks serve to compare the performance on MLTL<sup>0</sup> formulas whose SMV encodings were created with/without heuristics.

**Correctness Checking.** We compared the verdicts from all solvers for every test instance and found no inconsistencies, excluding segmentation faults. This exercise aided with verification of our implementations of the translators, including diagnosing the need for including FAIRNESS TRUE in BMC models.

**Experimental Results.** Figure 2 compares encoding times for the R benchmark formulas. We find that (1) Encoding MLTL as either LTL and LTL<sup>f</sup> is not scalable even when the intervals in the formula are small; (2) The cost of MLTL-to-SMV encoding is comparable to that from MLTL to SMT-LIB v2. Although the cost of encoding MLTL as LTL/LTL<sup>f</sup> and SMV are in O(K · |cl(ϕ)|), where K is the maximal interval length in ϕ, the practical gap between the LTL/LTL<sup>f</sup> encodings and SMV encoding affirms our conjecture that the SMV model is more compact in general than the corresponding LTL/LTL<sup>f</sup> formulas. Also because K is kept small in the R formulas, the encoding cost between SMV and SMT-LIB v2 becomes comparable.

Figure 3 shows total satisfiability checking times for R benchmarks. Recall that the inputs of both BMC and KLIVE approaches are SMV models. The MLTL-SAT via KLIVE is the fastest solving strategy for MLTL formulas with interval ranges of less than 100. The portion of satisfiable/unsatisfiable formulas of this benchmark is approximate 4/1. Although BMC is known to be good at detecting counterexamples with short lengths, it does not perform as well as the KLIVE and SMT approaches on checking satisfiable formulas since only longer counterexamples (with length greater than 1000) exist for most of these formulas. While nuXmv successfully checked all such models, Fig. 4 shows that increasing the interval range constraint results in segmentation faults; more than half of our benchmarks produced this outcome for formulas with allowed interval ranges of up to 600. Meanwhile, the solving solutions via LTL-SAT/LTL<sup>f</sup> -SAT are definitely not competitive for any interval range.

The SMT-based approach dominates the model-checking-approaches when considering scalable NB benchmarks, as shown in Fig. 5. Here, e.g., "BMC-1000" means using BMC to check the group of benchmarks with a maximal interval range of 1,000. Due to segmentation faults, "BMC-1000" and "KLIVE-1000" have almost the same performance because the SMV models generated from our translator MLTLconverter are too large for nuXmv to handle. The performance of the model-checking approaches is constrained by the scalability of the

**Fig. 4.** Proportion of segmentation faults for sets of 200 R formulas with maximal interval ranges varying from 100 to 1000.

model checker (nuXmv). However, the SMT encoding does not face such a bottleneck; see "Z3-1000," "Z3-10000," and "Z3-100000" in Fig. 5. We conclude that the SMT approach is the best available strategy for MLTL satisfiability checking.

**Fig. 5.** Cactus plot for BMC,KLIVE and SMTsolving approaches on the NB benchmarks; BMC and KLIVE overlap.

**Fig. 6.** Scatter plot for both the BMC and KLIVE approaches to checking MLTL<sup>0</sup> formulas ith/without encoding heuristics.

Finally, we evaluated the performance of model-checking-based approaches on the R0 formulas, observing that there is an exponential complexity gap between MLTL-SAT and MLTL0-SAT. Figure 6 compares the performance of satisfiability solving via the BMC and KLIVE approaches. There is no significant improvement when the SMV encoding heuristics for MLTL<sup>0</sup> are applied. For the BMC solving approach, performance is largely unaffected by encoding heuristics. For the KLIVE solving approach, encoding heuristics decrease solving performance. The results support the well-known phenomenon that the theoretic analysis and the practical evaluations do not always match.

We summarize with three conclusions. (1) For satisfiability checking of MLTL formulas, the new SMT-based approach is best. (2) For satisfiability checking of MLTL formulas with interval ranges less than 100, the MLTL-SAT via KLIVE approach is fastest. (3) The dedicated encoding heuristics for MLTL<sup>0</sup> do not significantly improve the satisfiability checking time of MLTL0-SAT over MLTL-SAT. They do not solve the nuXmv scalability problem.

#### **6 Discussion and Conclusion**

Metric Temporal Logic (MTL) was first introduced in [3], for describing continuous behaviors interpreted over infinite real-time traces. The later variants Metric Interval Temporal Logic (MITL) [5], and Bounded Metric Temporal Logic (BMTL) [30] are also interpreted over infinite traces. Intuitively, MLTL is a combination of MITL and BMTL that allows only bounded, discrete (over natural domain) intervals that are interpreted over finite traces. There are several previous works on the satisfiability of MITL, though their tools only support the infinite semantics. Bounded satisfiability checking for MITL formulas is proposed in [33], and the reduction from MITL to LTL is presented in [20]. Since previous works focus on MITL over infinite traces and there is no trivial way to reduce MLTL over finite traces to MITL over infinite traces, the previous methodologies are not comparable to those presented in this paper. This includes the SMT-based solution of reducing MITL formulas to equi-satisfiable Constraint LTL formulas [8]. Compared to that, our new SMT-based approach more directly encodes MLTL formulas into the SMT language without translation through an intermediate language.

The contribution of a complete, correct, and open-source MLTL satisfiability checking algorithm and tool opens up avenues for a myriad of future directions, as we have now made possible specification debugging MLTL formulas in design-time verification and benchmark generation for runtime verification. We plan to explore alternative encodings for improving the performance of MLTL satisfiability checking and work toward developing an optimized multi-encoding approach, following the style of the previous study for LTL [40]; the current SMT model generated from the MLTL formula uses a relatively simple theory (uninterpreted functions). We also plan to explore lazy encodings from MLTL formulas to SMT models. For example, instead of encoding the whole MLTL formula into a monolithic SMT model, we may be able to decrease overall satisfiability-solving time by encoding the MLTL formula in parts with dynamic ordering similar to [15]. To make the output of SMT-based MLTL satisfiability checking more usable, we plan to investigate translations from the functions returned from Z3 for satisfiable instances into more easily parsable satisfying assignments.

**Acknowledgment.** We thank anonymous reviewers for their helpful comments. This work is supported by NASA ECF NNX16AR57G, NSF CAREER Award CNS-1552934, NSF grants IIS-1527668, IIS-1830549, and by NSF Expeditions in Computing project "ExCAPE: Expeditions in Computer Augmented Program Engineering."

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **High-Level Abstractions for Simplifying Extended String Constraints in SMT**

Andrew Reynolds<sup>1</sup> , Andres N¨otzli2(B) , Clark Barrett<sup>2</sup> , and Cesare Tinelli<sup>1</sup>

<sup>1</sup> Department of Computer Science, The University of Iowa, Iowa City, USA <sup>2</sup> Department of Computer Science, Stanford University, Stanford, USA noetzli@cs.stanford.edu

**Abstract.** Satisfiability Modulo Theories (SMT) solvers with support for the theory of strings have recently emerged as powerful tools for reasoning about string-manipulating programs. However, due to the complex semantics of *extended string functions*, it is challenging to develop scalable solvers for the string constraints produced by program analysis tools. We identify several classes of simplification techniques that are critical for the efficient processing of string constraints in SMT solvers. These techniques can reduce the size and complexity of input constraints by reasoning about arithmetic entailment, multisets, and string containment relationships over input terms. We provide experimental evidence that implementing them results in significant improvements over the performance of state-of-the-art SMT solvers for extended string constraints.

#### **1 Introduction**

Most programming languages support strings natively and a considerable number of programs perform some form of string manipulation. Automated reasoning about string-manipulating programs for verification and test case generation purposes is then highly relevant for these languages and programs. Applications to security, such as finding SQL injection and XSS vulnerabilities in web applications [16,18,23] or proving their absence, are of critical importance. String constraints have also been used to generate relational database tables from SQL queries for unit testing purposes [21]. These applications require modeling all of the string operations that appear in real programs. This is challenging since some of those operations are complex and often realized by iterative applications of simpler operations. Additionally, since strings in many programming languages have variable length, reasoning accurately about them cannot be done by a reduction to bounded types such as bit-vectors, and requires instead the development of solvers for *unbounded* strings. To make this type of reasoning more scalable, the use of dedicated theory solvers natively supporting common string operations has been proposed [5,9]. Some string solvers are fully integrated within Satisfiability Modulo Theories (SMT) solvers [4,12]; some are built (externally) on top of such solvers [9,16,19]; and others are independent of SMT solvers [23].

A major challenge in developing solvers for unbounded string constraints is the complex semantics of *extended string functions* beyond the basic operations of string concatenation and equality. Extended functions include replace, which replaces a string in another string, and indexof, which returns the position of a string in another string. Another challenge is that constraints using extended functions are often combined with constraints over other theories, e.g. integer constraints over string lengths or applications of indexof, which requires the involvement of solvers for those theories. Current string solvers address these challenges by reducing constraints with extended string functions to typically more verbose constraints over basic functions. As with every reduction, some of the higher level structure of the problem may be lost, with negative repercussions on the performance and scalability.

To address this issue, we have developed new techniques that reason about constraints with extended string operators before they are reduced to simpler ones. This analysis of complex terms can often eliminate the need for expensive reductions. The techniques are based on reasoning about relationships over strings with high-level abstractions, such as their arithmetic relationships (e.g., reasoning about their length), their string containment relationships, and their relationships as multisets of characters. We have implemented these techniques in cvc4, an SMT solver with native support for string reasoning. An experimental evaluation with benchmarks from various applications shows that our new techniques allows cvc<sup>4</sup> to significantly outperform other state-of-the-art solvers that target extended string constraints.

Our main contributions are:


In the remainder of this section, we discuss related work. In Sect. 2, we provide some background on the theory of strings and how solvers reduce extended functions. In Sects. 3, 4 and 5, we describe, respectively, our arithmetic-based, containment-based, and multiset-based simplification techniques. Section 6 describes our implementation of those techniques, and Sect. 7 presents our evaluation.

**Related Work.** Various approaches to solving constraints over extended string functions have been proposed. Saxena et al. [16] showed that constraints from the symbolic execution of JavaScript code contain a significant number of extended string functions, which underlines their importance. Their approach translates string constraints to bit-vector constraints, similar to other approaches based on bounded strings such as HAMPI [9]. Bjørner et al. [5] proposed native support for extended string operators in string solvers for scaling symbolic execution of .NET code. They reduce extended string functions to basic ones after getting bounds for string lengths from an integer solver. They also showed that constraints involving unbounded strings and replace are undecidable. PASS [11] reduces string constraints over extended functions to arrays. Z3-str and its successors [4,24,25] reduce extended string functions to basic functions eagerly during preprocessing. S3 [18] reduces recursive functions such as replace incrementally by splitting and unfolding. Its successor S3P [19] refines this reduction by pruning the resulting subproblems for better performance. cvc<sup>4</sup> [3] reduces constraints with extended functions lazily and leverages context-dependent simplifications to simplify the reductions [15]. Trau [1] reduces certain extended functions, such as replace, to context-free membership constraints. Ostrich [7] implements a decision procedure for a subset of constraints that include extended string functions. The simplification techniques presented in this paper are agnostic to the underlying solving procedure, so they can be combined with all of these approaches.

#### **2 Preliminaries**

We work in the context of many-sorted first-order logic with equality and assume the reader is familiar with the notions of signature, term, literal, formula, and formal interpretation of formulas. We review a few relevant definitions in the following. A *theory* is a pair T " (Σ, **I**) where Σ is a signature and **I** is a class of Σ-interpretations, the *models* of T. We assume Σ contains the equality predicate «, interpreted as the identity relation, and the predicates J (for true) and K (for false). A Σ-formula ϕ is *satisfiable* (resp., *unsatisfiable*) *in* T if it is satisfied by some (resp., no) interpretation in **I**. We write |"<sup>T</sup> ϕ to denote that the Σ-formula ϕ is T*-valid*, i.e., is satisfied in every model of T. Two Σ-terms t<sup>1</sup> and t<sup>2</sup> are *equivalent in* T if |"<sup>T</sup> t<sup>1</sup> « t2.

We consider an extended theory T<sup>S</sup> of strings and length equations, whose signature Σ<sup>S</sup> is given in Fig. 1 and whose models differ only on how they interpret variables.<sup>1</sup> We assume a fixed finite alphabet <sup>A</sup> of characters which includes the digits {0,..., 9}. The signature includes the sorts Bool, Int, and Str denoting the Booleans, the integers (Z), and Kleene closure of <sup>A</sup> (A<sup>∗</sup>), respectively. The top half of Fig. 1 includes the usual symbols of *linear* integer arithmetic, interpreted as expected, a *string literal* l for each word/string of A<sup>∗</sup>, a variadic function symbol con, interpreted as word concatenation, and a function symbol len, interpreted as the word length function. We write for the empty word and abbreviate len(s) as |s|. We use words over the characters a, b, and c, as in abca, as concrete examples of string literals.

We refer to the function symbols in the bottom half of the figure as *extended functions* and refer to terms containing them as *extended terms*. A *position* in

<sup>1</sup> Our implementation supports a larger set of symbols, but for brevity, we only show the subset of the symbols used throughout this paper.

**Fig. 1.** Functions in signature <sup>Σ</sup>S. Str and Int denote strings and integers respectively.

a string l P A<sup>∗</sup> is a non-negative integer n smaller than the length of l that identifies the (n ` 1)th character of l—with 0 identifying the first character, 1 the second, and so on. For all models <sup>I</sup> of <sup>T</sup>S, all l,l1, l<sup>2</sup> <sup>P</sup> <sup>A</sup><sup>∗</sup>, and n, m <sup>P</sup> <sup>Z</sup>, substrI(l, n, m) (the interpretation of substr in I applied to l, n, m) is the longest substring of l starting at position n with length at most m, or if n is an invalid position or m is not positive; containsI(l1, l2) is true if and only if l<sup>2</sup> is a substring of l1, with being a substring of every string; indexofI(l1, l2, n) is the position of the first occurrence of l<sup>2</sup> in l<sup>1</sup> at or after position n, n if l<sup>2</sup> is empty and 0 ď n ď |l1|, and −1 if n is an invalid position, or if no such occurrence exists; replaceI(l,l1, l2) is the result of replacing the first occurrence of l<sup>1</sup> in l by l2, l if l does not contain l1, or the result of prepending l<sup>2</sup> to l if l<sup>1</sup> is empty; str.to.intI(l) is the non-negative integer represented by l in decimal notation or −1 if the string contains non-digit characters; int.to.strI(n) is the result of converting n to the corresponding string in decimal notation if n is non-negative, or otherwise. We write substr(t, u) as shorthand for the term substr(t, u, |t|), i.e. the suffix of t starting at position u.

Note that the semantics for replace and indexof correspond to the semantics in the current draft of the SMT-LIB standard for the theory of strings [17]; they are slightly different from the ones described in previous work [4,15,20].

#### **2.1 Solving Extended String Constraints (with Simplification)**

Various efficient solvers have been designed for the satisfiability problem for quantifier-free <sup>T</sup>S-constraints, including cvc<sup>4</sup> [3], <sup>s</sup>3# [20] and <sup>z</sup>3str<sup>3</sup> [4]. In this section, we give an overview of how these solvers process extended functions in practice.

Generally speaking, constraints involving extended functions are converted to basic ones through a series of reductions performed in an incremental fashion by the solver. Operators whose reduction requires universal quantification are dealt with by guessing upper bounds on the lengths of input strings or by lazily adding constraints that block models that do not satisfy extended string constraints.

*Example 1.* To determine the satisfiability of contains(t, s), the application of contains is reduced to constraints that ensure that s is not a substring of t at any position. Assuming we have a fixed upper bound n on the length of t, the above constraint is equivalent to the finite conjunction substr(t, 0, |s|) -« s ^ ··· ^ substr(t, n, |s|) -« s. Each application of substr is then eliminated by introducing an equality that constrains a fresh variable x<sup>i</sup> to have the semantics of that substring. Thus, reducing the formula above results in

$$\bigwedge\_{i=0}^{n} |t| \gg i + |s| \Rightarrow (x\_i \not\approx s \land t \approx \mathsf{con}(x\_i^{pre}, x\_i, x\_i^{post}) \land |x\_i^{pre}| \approx i \land |x\_i| \approx |s| \;)$$

where xi, xpre <sup>i</sup> , xpost <sup>i</sup> are fresh string variables.<sup>2</sup> The above conjunction involves only string concatenation, string length, and equality, and thus can be handled by a string solver with support for word equations with length constraints.

The reduction in Example 1 introduces 5 · n theory literals over basic string functions and 3 · n string variables. A full reduction accounting for all corner cases of substr is even more complex and thus more expensive to process, even for small values of n. These performance challenges can be addressed by aggressive simplifications that *eliminate* extended functions using high-level reasoning, as shown in the next example.

*Example 2.* Consider an instance of the previous example where s " con(a, x) and t " con(b,substr(x, 0, n)). A full reduction of contains(t, s) that eliminates all applications of substr, including those in t, introduces 5 · n ` 5 new theory literals and 3·n`3 string variables. However, based on the semantics of contains it is easy to see that contains(t, s) is TS-valid: if t were to contain s, then s would have to occur in the portion of t after its first character b, since the first character of s is a. However, con(a, x) cannot be contained in substr(x, 0, n), since the length of the former is at least |x| ` 1, while the length of the latter is at most |x|. A solver which recognizes that contains(t, s) can be simplified to J in this case can avoid the reduction altogether.

We advocate for aggressive simplification techniques to improve the performance of string solvers for extended functions. In the next sections, we describe several classes of such techniques that can be applied to inputs as a preprocessing step or during solving as part of a context-dependent solving strategy [15]. We present them as sets R of rewrite rules of the form t Ñ<sup>R</sup> s, where s is a (simplified) term equivalent to t in TS. We assume a deterministic application strategy for these rules, such that each term t rewrites to a unique *simplified form*, denoted by tÓ, which is irreducible by the rules. We split our simplifications into four categories, presented in Figs. 4, 6, 7 and 8. 3

#### **3 Arithmetic-Based String Simplification**

To simplify string terms, it is useful to establish relationships between quantities such as the lengths of strings. For example, contains(t, s) can be simplified to K

<sup>2</sup> This formula is a simplified form of the general reduction. The general reduction also expresses that <sup>i</sup> is a valid position in <sup>t</sup> and that the third argument of substr is non-negative [15].

<sup>3</sup> Some specialized rules have been omitted for space reasons.

for a particular s and t if it can be inferred that |s| is strictly greater than |t|. This section defines an inference system for such arithmetic relationships and the simplifications that it enables.

We are interested in proving the TS-validity of formulas of the form u ě 0, where u is a ΣS-term of integer type. We describe an inference system as a set of rules for deriving judgments of the form u ě 0 and a specific rule application strategy we have implemented. The inference system is *sound* in the sense that |"T<sup>S</sup> u ě 0 whenever u ě 0 is derivable in it. It is, however, *incomplete* as it may fail to derive u ě 0 in some cases when |"<sup>T</sup><sup>S</sup> u ě 0. This incompleteness is by design, since proving the TS-validity of inequalities is generally expensive due to the NP-hardness of linear integer arithmetic. Without loss of generality, we require that the term u be in a simplified form, where terms of the form |l| with l a string literal of n characters are rewritten to n, terms of the form |con(t1,...,tn)| are rewritten to |t1|`···`|tn|, and like monomials in arithmetic terms are combined in the usual way (e.g., 2 · |x| ` |x| is rewritten to 3 · |x|).

**Definition 1 (Polynomial Form).** *An arithmetic term* u *is in polynomial form if* u " m<sup>1</sup> · u<sup>1</sup> ` ...m<sup>n</sup> · u<sup>n</sup> ` m*, where* m1,...,m<sup>n</sup> *are non-zero integer constants,* m *is an integer constant, and each* u1,...,u<sup>n</sup> *is a unique term and one of the following:*


Given u in polynomial form, our inference system uses a set of over- and underapproximations for showing that u ě 0 holds in all models of TS. We define two auxiliary rewrite systems, denoted Ñ<sup>O</sup> and Ñ<sup>U</sup> . If u rewrites to v (in zero or more steps) in ÑO, written u Ñ<sup>∗</sup> <sup>O</sup> v, we say that v is *an over-approximation* of u. We can prove in that case that |"T<sup>S</sup> v ě u. Dually, if u rewrites to v in Ñ<sup>U</sup> , written u Ñ<sup>∗</sup> <sup>U</sup> v, we say that v is *an under-approximation* of u and can prove that |"<sup>T</sup><sup>S</sup> u ě v. Based on these definitions, the core of our inference system can be summarized by the single inference rule schema provided in Fig. 2 together with the conditional rewrite systems Ñ<sup>O</sup> and Ñ<sup>U</sup> which are defined inductively in terms of the inference system and each other.

A majority of the rewrite rules have side conditions requiring the derivability of certain judgments in the same inference system. To improve their readability we take some liberties with the notation and write u<sup>1</sup> ě u2, say, instead of u<sup>1</sup> −u<sup>2</sup> ě 0. For example, |substr(t, v, w)| is under-approximated by w if it can be inferred that the interval from v to v`w is a valid range of positions in string t, which is expressed by the side conditions v ě 0 and |t| ě v ` w. Note that some arithmetic terms, such as |substr(t, v, w)|, can be approximated in *multiple* ways—hence the need for a strategy for choosing the best approximation for arithmetic string terms, described later. The rules for polynomials are written modulo associativity of ` and state that a monomial m · v in them can be overor under-approximated based on the sign of the coefficient m. For simplicity,

**Fig. 2.** Rules for arithmetic entailment based on under- and over-approximations computed for arithmetic terms containing extended string operators. We write t, s, r to denote string terms, u, u- , v, w to denote integer terms and m, n to denote integer constants.

we silently assume in the figure that basic arithmetic simplifications are applied after each rewrite step to put the right-hand side in polynomial form.

*Example 3.* Let u be |replace(x, aa, b)|. Because |aa| ě |b|, the first case of the over-approximation rule for replace applies, and we get that u Ñ<sup>O</sup> |x|. This reflects that the result of replacing the first occurrence, if any, of aa in x with b is no longer than x.

*Example 4.* Let u be the same as in the previous example and let v be −1 · u ` 2 · |x|. Since u Ñ<sup>O</sup> |x| and the coefficient of u in v is negative, we have that v Ñ<sup>U</sup> −1 · |x| ` 2 · |x|, which simplifies to |x|; moreover, |x| Ñ<sup>U</sup> 0. Thus, v Ñ<sup>∗</sup> <sup>U</sup> 0 and so v ě 0. In other words, we can use the approximations to show that u is at most 2 · |x|.

#### **3.1 A Strategy for Approximation**

The rewrite systems Ñ<sup>O</sup> and Ñ<sup>U</sup> allow for many possible derivations. Thus, it is important to devise a strategy that is efficient and succeeds often in practice. We use a greedy rule application strategy that favors rule applications leading to the cancellation of monomials. For example, consider the term |x|−|substr(y, 0, |x|)|, and observe that the subtrahend can be over-approximated either by |y| or by |x|. However, proving the TS-validity of |x|−|substr(y, 0, |x|)| ě 0 with the former over-approximation is impossible since |x|−|y| ě 0 does not hold in all models of TS. In contrast, the latter approximation produces |x|−|x| ě 0 which is trivially TS-valid.

**Fig. 3.** A greedy strategy for showing arithmetic entailments in the theory TS. We write negcoeff(u) to denote the set of terms whose coefficient is negative in <sup>u</sup>.

Recall that, given an arithmetic inequality u ě 0, our goal is to find a reduction u Ñ<sup>∗</sup> <sup>U</sup> n where n is a non-negative constant. Our strategy for choosing which rule of Ñ<sup>U</sup> to apply to u is given in Fig. 3. We decompose u into three parts: the portion u<sup>x</sup> consisting of a sum of integer variables, the portion u consisting of a sum of lengths of string variables, and the remaining portion u<sup>s</sup> which is a sum of monomials involving extended terms v1,...,v<sup>q</sup> as defined in Definition 1.

Since there are multiple choices for how terms in u<sup>s</sup> are approximated, the strategy focuses primarily on this portion. In particular, we apply an approximation for one of the terms vi, under-approximating or over-approximating depending on the sign of its coefficient, and replace the monomial in t by its corresponding approximation. The choice of v<sup>i</sup> and v<sup>a</sup> <sup>i</sup> is based on maximizing the likelihood that the overall derivation will produce a non-negative constant.

For a term u in polynomial form, let negcoeff(u) be a set of integer terms whose coefficient is negative in u, e.g. negcoeff(y<sup>1</sup> ` −1 · y2) " {y2}. Terms in this set can be seen as *obligations* for proving entailments in our derivations since if y<sup>2</sup> P negcoeff(u), it must be the case that our derivation applies a rule that introduces a term with a positive coefficient for y2. In Fig. 3, we say that our choice of v<sup>i</sup> Ñ<sup>U</sup> v<sup>a</sup> <sup>i</sup> *avoids new terms* if it does not have the effect of adding any new terms to negcoeff(u), and *cancels existing terms* if it has the effect of removing terms from this set. If the portion u<sup>s</sup> is empty, we apply the rule <sup>|</sup>x<sup>j</sup> <sup>|</sup> <sup>Ñ</sup><sup>U</sup> 0 if there exists a monomial <sup>m</sup>- <sup>j</sup> · |x<sup>j</sup> <sup>|</sup> where <sup>m</sup>- <sup>j</sup> is positive. This rule is applied with lowest priority because these monomials may help to cancel negative terms introduced by the other steps.

Step 1 depends on knowing the set of possible one-step approximations v<sup>i</sup> Ñ<sup>U</sup> va <sup>i</sup> and v<sup>i</sup> Ñ<sup>O</sup> v<sup>a</sup> <sup>i</sup> for terms from u. These are determined using the rules of Fig. 2. Whenever applicable, we break ties between rewrites in Step 1 by considering a fixed arbitrary ordering over extended terms.

*Example 5.* Let u be 1`|t1|`|t2|−|x1|, where t<sup>1</sup> is substr(x2, 1, |x2|`|x4|) and <sup>t</sup><sup>2</sup> is replace(x1, x2, x3). Step 1 of Str-Arith-Approx considers the possible approximations |t1| Ñ<sup>U</sup> |x2| − 1 and |t2| Ñ<sup>U</sup> |x1|−|x2|. Note that underapproximations are needed because the coefficients of |t1| and |t2| are positive. The first approximation is an instance of the third rule in Fig. 2, noting that both 1 ě 0 and 1 ` |x2| ` |x4| ě |x2| are derivable by a *basic* strategy that, wherever applicable, under-approximates string length terms as zero. Our strategy chooses the first approximation since it introduces no new negative coefficient terms, thus obtaining: u Ñ<sup>U</sup> |x2| ` |t2|−|x1|. We now choose the approximation |t2| Ñ<sup>U</sup> |x1|−|x2|, noting that it introduces no new negative coefficient terms and cancels an existing one, |x1|. After arithmetic simplification, we have derived u Ñ<sup>∗</sup> <sup>U</sup> 0, and hence u ě 0.

One can show that our strategy is sound, terminating, and deterministic. This means that applying Str-Arith-Approx to completion produces a unique rewrite chain of the form t Ñ<sup>U</sup> u<sup>1</sup> Ñ<sup>U</sup> ... Ñ<sup>U</sup> u<sup>n</sup> for a finite n, where each step is an application of one of the rewrite rules from Fig. 2.

#### **3.2 Simplification Rules with Arithmetic Side Conditions**

We use the inference system from the previous section for simplifications of string terms with arithmetic side conditions. Figure 4 summarizes those simplifications.

The first rule rewrites a string equality to K if one of the two sides can be inferred to be strictly longer than the other. In the second rule, if one side of an equality, con(s, r, q), is such that the sum of lengths of s and q alone can be shown to be greater than or equal to the length of the other side, then r must be empty. The third rule recognizes that string containment reduces to string

$$\begin{array}{llll} t \approx s \rightarrow \bot & \text{if } \vdash |t| \gg |s| + 1\\ t \approx \mathsf{con}(s, r, q) \rightarrow & t \approx \mathsf{con}(s, q) \land r \approx \epsilon & \text{if } \vdash |s| + |q| \gg |t|\\ \mathsf{contains}(t, s) \rightarrow & t \approx s & \text{if } \vdash |s| \gg |t|\\ \mathsf{substr}(t, v, w) \rightarrow & \epsilon & \text{if } \vdash |s| \gg |t|\\ \mathsf{substr}(\mathsf{con}(t, s), v, w) \rightarrow & \mathsf{substr}(s, v - |t|, w) & \text{if } \vdash v \gg |t|\\ \mathsf{substr}(\mathsf{con}(s, t), v, w) \rightarrow & \mathsf{substr}(s, v, w) & \text{if } \vdash |s| \gg v + w\\ \mathsf{substr}(\mathsf{con}(t, s), 0, w) \rightarrow & \mathsf{con}(t, \mathsf{substr}(s, 0, w - |t|)) \text{ if } \vdash w \gg |t|\\ \mathsf{endstr}(t, s, v) \rightarrow & \mathsf{it}(\mathsf{substr}(t, v) \approx s, v, -1) \text{ if } \vdash v + |s| \gg |t| \end{array}$$

**Fig. 4.** String simplification rules. Letters t, s, r, q denote string terms; v, w denote integer terms.

equality when it can be inferred that string s is at least as long as the string t that must contain it. The next rule captures the fact that substring simplifies to the empty string if it can be inferred that its position v is not within bounds, or its length w is not positive. In the figure, we write that rule with a disjunctive side condition; this is a shorthand to denote that we can pick any disjunct and show that it holds assuming the negation of the other disjuncts. We can use those assumptions to perform substitutions to simplify the derivation. Concretely, to show u<sup>1</sup> ě u<sup>2</sup> \_ ... \_ u -« u it is sufficient to infer (u<sup>1</sup> ě u2)[u Ñ u ]. We demonstrate this with an example.

*Example 6.* Consider the term substr(t, |t| ` w, w). Our rules may simplify this term to by inferring that its start position (|t| ` w) is not within the bounds of t if we assume that its size (w) is positive. In detail, assume that w > 0 (the negation of the last disjunct in the side condition of the fourth rule), which is equivalent to w « |x| ` 1 where x is a fresh string variable and |x| denotes an unknown non-negative quantity. It is sufficient to derive the formula obtained by replacing all occurrences of w by |x|`1 in the disjunct |t|`w ě |t| to show that the start position of our term is out of bounds. After simplification, we obtain |x| ` 1 ě 0, which is trivial to derive.

The next two rules in Fig. 4 apply if we can infer respectively that the start position of the substring comes strictly after a prefix t or that the end position of the substring comes strictly before a suffix t of the first argument string. In either case, t can be dropped.

*Example 7.* Let t be substr(con(x1,replace(x2, x3, x4)), 0, w), where w is |x1| − |x2|. We have that t Ñ substr(x1, 0, w), noting that |x1| ě 0 ` |x1|−|x2|. In other words, only the first component x<sup>1</sup> of the string concatenation is relevant to the substring since its end point must occur before the end of x1.

The final rule for substr shows that a prefix of a substring can be pulled upwards if the start position is zero and we can infer that the substring is guaranteed to include at least a prefix string t. Finally, if we can infer that the last position of s in t starting from position v is at or beyond the end of t, then the indexof term can be rewritten as an if-then-else (ite) term that checks whether s is a suffix of t.

#### **4 Containment-Based String Simplification**

This section provides an overview of simplifications that are based on reasoning about the containment relationship between strings. We describe an inference system for deriving when one string is definitely contained or not contained in another. Following the notation from the last section, we write t Q s to denote the judgment of our inference system, denoting that string t contains string s in all models of TS. Conversely, we write t S s to denote string t does not contain string <sup>s</sup>. We write <sup>t</sup> <sup>Q</sup><sup>p</sup> <sup>s</sup> (resp., <sup>t</sup> <sup>Q</sup><sup>s</sup> <sup>s</sup>) to denote the judgment indicating that s must be a prefix (resp., suffix) of t.


**Fig. 5.** Inferences for string containment <sup>Q</sup>, is-prefix <sup>Q</sup>*<sup>p</sup>* and is-suffix <sup>Q</sup>*<sup>s</sup>*.

Rules for inferring judgments of these forms are given in Fig. 5. Like our rules for arithmetic, these rules are solely based on the syntactic structure of terms, so inferences in this system can be computed statically. Both the assumptions and conclusions of the rules assume associativity of string concatenation with identity element , that is, con(t, s) may refer to a term of the form con(con(t1, t2), s) " con(t1, t2, s) or alternatively to con(, s) " s. Most of the rules are straightforward. The inference system has special rules for substring terms substr(t, v, w), using arithmetic entailments from Sect. 3 to show prefix and suffix relationships with the base string t. For negative containment, the rules of the inference system together can show a (possibly non-constant) string cannot occur in a constant string by reasoning that its characters cannot appear in order in that string. We write l<sup>1</sup> \ l<sup>2</sup> to denote the empty string if l<sup>1</sup> does not contain l2, or the result of removing the smallest prefix of l<sup>1</sup> that contains l<sup>2</sup> from l<sup>1</sup> otherwise.

*Example 8.* Let t be abcab and let s be con(b, x, a, y, c). String s is not contained in t for any value of x, y. We derive t S s using two applications of the rightmost rule for negative containment in Fig. 5, noting abcab \ b " cab, cab \ a " b, and b does not contain c. In other words, the containment does not hold since the characters b, a and c cannot be found in order in the constant abcad.

#### **4.1 Simplification Rules Based on String Containment**

Figure 6 gives rules for simplifying extended function terms based on the aforementioned judgments pertaining to string containment. First, equalities can be rewritten to false and applications of contains can be rewritten to a constant based on the appropriate judgment of our inference system. Applications of indexof can be simplified to −1 if it can be shown that the second argument does not appear in the suffix of the first argument starting at the position given by the third argument. The next two rules reason about cases where the second argument s definitely occurs in the first argument starting from position v. In this case, if we additionally know that s occurs within (beyond) a prefix t of

**Fig. 6.** Simplification rules based on string containment.

the first argument, then the suffix r (prefix t) can be dropped, where the start position and the return value of the result are modified accordingly. If we know s is a prefix of the first argument at position v, then the result is v if indeed v is in the bounds of t. Notice that the latter condition is necessary to handle the case where s is the empty string. The three rules for replace are analogous. First, the replace rewrites to the first argument if we know it does not contain the second argument s. If we know s is definitely contained in a prefix of the first argument, then we can pull the remainder of that string upwards. Finally, if we know s is a prefix of the first argument, then we can replace that prefix with r while concatenating the remainder. We use the term substr(t, |s|) to denote the remainder after the replacement for the sake of brevity, although this term typically does not involve extended functions after simplification, e.g. replace(con(x, y), x, z) Ñ con(z,y) noting that (substr(con(x, y), |x|))Ó " y, or replace(ab, a, x) Ñ con(x, b) noting that (substr(ab, |a|))Ó " b.

#### **4.2 Simplifications Based on Equivalence of String Containment**

We further refine our approach based on inferring when one containment is *equivalent* to another one. For example, con(a, x) is contained in con(b, y) if and only if con(a, x) is contained in y alone. We introduce simplifications for such equivalences by reasoning about the maximal overlap between two strings.

We adapt and extend the notation given in previous work [15]. Given string literals l<sup>1</sup> and l2, the *sufficient left overlap* of l<sup>1</sup> and l2, written l<sup>1</sup> \<sup>l</sup> l2, is the largest suffix of l<sup>1</sup> that is a prefix of l<sup>2</sup> or has l<sup>2</sup> as a prefix. For example, we have abc \<sup>l</sup> cd " c, abc \<sup>l</sup> b " bc, and abc \<sup>l</sup> ba " . We extend this definition to arbitrary strings s such that l<sup>1</sup> \<sup>l</sup> s is equivalent to l<sup>1</sup> \<sup>l</sup> l<sup>2</sup> for the largest constant prefix l<sup>2</sup> of s, where notice that l<sup>2</sup> is the empty string if s does not have a constant prefix. For example, we have abc\lcon(cde, y) " c, abc\lcon(b, y) " bc, and abc \<sup>l</sup> con(a, y) " abc. We define the dual operator *sufficient right overlap*, written l<sup>1</sup> \<sup>r</sup> l2, which is the largest prefix of l<sup>1</sup> that is a suffix of l<sup>2</sup> or has l<sup>2</sup> as a suffix, e.g. abc \<sup>r</sup> b " ab, and extend this to arbitrary strings in an analogous way. The sufficient left (resp., right) overlap operator can be used to determine how much of a constant string prefix l<sup>1</sup> (resp., suffix) can be safely removed from a string without impacting whether it contains another string.


**Fig. 7.** Simplification rules based on equivalence of string containment. We write l, l1, l<sup>2</sup> to denote string literals, v, w to denote integer terms and t, s to denote string terms.

The rules in Fig. 7 simplify extended terms by considering string overlaps. The first two rules drop parts of string literals from the suffix or prefix of their first arguments. The two rules for indexof are similar: a suffix of the first argument can be dropped if it does not contribute to whether it contains the second argument. A prefix of an indexof term can be dropped if it does not contribute to containment, but only in the case where we know the second argument is definitely contained in the first argument. This is to guard against the case where the entire indexof term returns −1. The rules for replace are similar to those for contains, except that the suffix (resp., prefix) of the first argument is pulled upwards instead of being dropped.

#### **5 Multiset-Based String Simplification**

Next, we introduce simplifications based on reasoning about strings as multisets, i.e. collections of unordered characters. Such reasoning is sufficient for showing that equalities like con(a, x) « con(x, b) are equivalent to K, since the left side of the equality contains exactly one more occurrence of character a than the right-hand side. Similar to arithmetic reasoning from Sect. 3, we use approximations when reasoning about strings as multisets. We define the *multiset abstraction* of t, written Mt, as the multiset {t1,...,tn} where t is equivalent to con(t1,...,tn) and all constants in this set are characters. For example, Mcon(aba,x) " {a, a, b, x}. We define a rewrite system Ñ<sup>M</sup> <sup>O</sup> over strings where a rewritten string over-approximates the original string in the following sense: if t Ñ<sup>M</sup> <sup>O</sup> s, then for all models of T<sup>S</sup> and any character c, the number of occurrences of c in the strings in M<sup>s</sup> is greater than or equal to the number of occurrences in the strings in Mt.

Figure 8 lists the rules for the rewrite system Ñ<sup>M</sup> <sup>O</sup> and the simplifications based on multiset reasoning. Given a predicate contains(t, s), if overapproximating t with respect to the rules of Ñ<sup>M</sup> <sup>O</sup> results in a string r, and it can be determined that s contains strictly more occurrences of some character c than r, then it cannot be the case that s is contained in t. To establish this, we check whether the multiset difference of M<sup>s</sup> and M<sup>r</sup> contains c, and conversely the difference of M<sup>r</sup> and M<sup>s</sup> contains only character constants which are distinct from c. In the second rule, if one side of an equality can be determined to contain *only* a character c, then one occurrence of that character can be dropped from both sides of the equality, since the relative position of that character does not matter. The three rules for Ñ<sup>M</sup> <sup>O</sup> state that the multiset abstraction of a term of the form substr(t, v, w) can be over-approximated as the entire string t; a term replace(t, s, r) can be over-approximated as a string having both t and r; and over-approximation can be applied to the children of con terms.

$$\begin{array}{c} \mathsf{contains}(t,s) \to \bot \\ \mathcal{M}\_{s} | \mathcal{M}\_{r} = \{c,s\_{1},\ldots,s\_{n}\} \text{ and} \\ \mathcal{M}\_{r} | \mathcal{M}\_{s} = \{c\_{1},\ldots,c\_{m}\} \\ \mathsf{con}(t,c,s) \approx \mathsf{con}(q,c,r) \to \mathsf{con}(t,s) \approx \mathsf{con}(q,r) \text{ if } \begin{array}{l} \mathcal{M}\_{r} \leftarrow \mathcal{M}\_{s} \end{array} \\ \mathcal{M}\_{r} | \mathcal{M}\_{s} = \{c\_{1},\ldots,c\_{m}\} \\ \mathcal{M}\_{p} = \{c,\ldots,c\_{l}\} \end{array} \\ \text{where} \\ \mathsf{Substr}(t,v,w) \rightharpoonup\_{O}^{\mathcal{M}} t \\ \text{where } \mathsf{replace}(t,s,r) \rightharpoonup\_{O}^{\mathcal{M}} \mathsf{con}(t,r) \\ \mathsf{con}(t,s,r) \rightharpoonup\_{O}^{\mathcal{M}} \mathsf{con}(t,q,r) \text{ if } s \to\_{O}^{\mathcal{M}} q \end{array}$$

**Fig. 8.** Simplification rules based on multiset reasoning. We write c, c1,... to denote characters, v, w to denote integer terms, and t, s, r, q, p to denote string terms.

*Example 9.* We have that con(aaa,substr(x, y1, y2)) « con(x, b) Ñ K by noting that con(aaa,substr(x, y1, y2)) Ñ<sup>M</sup> <sup>O</sup> <sup>∗</sup>con(aaa, x), Mcon(aaa,x) " {a, a, a, x} and Mcon(x,b) " {b, x}. The difference of the latter with the former is {b}, and the former with the latter is {a, a, a}. Thus, the right side of the equality contains at least one more occurrence of b than the left side; hence, the equality is equivalent to false.

#### **6 Implementation**

We implemented the above simplification rules and others in the DPLL-based SMT solver cvc4, which implements a theory solver for a basic fragment of word equations with length, several other theory solvers, and reduction techniques for extended string functions as described in Sect. 2.1. Our simplification rules are run in a *preprocessing* pass as well as an *inprocessing* pass during solving. For the latter, we use a context-dependent simplification strategy that infers when an extended string constraint, e.g., contains(t, s), simplifies to K based on other assertions, e.g., s « . Our simplification techniques do not affect the core procedure for the theory of strings, nor the compatibility of the string solver with other theories. In total, our implementation is about 3,500 lines of C++ code. We cache the results of the simplifications and the approximation-based arithmetic entailments to amortize their costs.

**Additional Simplification Rules.** The simplification rules in this paper are a subset of the rules in the implementation. We omit other uncategorized rules for lack of space. Many of these apply to specific term patterns, such as cases where two nested applications of substr can be combined; cases where an application of replace can be eliminated by case splitting; and other cases like con(t, t) « a Ñ K. An example of such rules is contains(replace(t, w1, w2), w3) Ñ contains(t, w3) if w<sup>3</sup> does not overlap with either w<sup>1</sup> or w2, because the replace does not change whether t contains w<sup>3</sup> or not. Another class of rules only applies to strings of length one because they cannot span multiple components of a concatenations, e.g. contains(con(t, s), c) Ñ contains(t, c) \_ contains(s, c) where c is a character. Finally, there are rewrites that benefit from multiple techniques presented in this paper. For example, we have a rewrite that splits string equations into multiple smaller equations if it can determine that prefixes must have the same length: con(a, t, s) « con(t, b, r) Ñ con(a, t) « con(t, b) ^ s « r Ñ K.

**Validating Simplification Rules.** The correctness of our simplification techniques is critical to the soundness of the overall solver. Due to the sophistication and breadth of those techniques, it is challenging to formally verify our implementation. As a pragmatic alternative, we periodically test our implementation using a testing infrastructure we developed for this purpose. We found this to be critical in our development process. Our testing infrastructure allows the developer to specify a context-free grammar in the syntax-guided synthesis format [2]. We generate all terms t in this grammar up to a fixed size and test the equivalence of t and its simplified form tÓ on a set of randomly generated points. The most recent run of this system on two grammars (one for extended string terms and another for string predicates) up to a term size of three, validated 319,867 simplifications of string terms and 188,428 simplifications of string predicates on 1,000 sample points. This run took 924 s for string terms and 971 s for the string predicates using the same hardware as in Sect. 7.

#### **7 Evaluation**

We evaluate the impact of each simplification technique as implemented in cvc<sup>4</sup> on three benchmark sets that use extended string operators: CMU, a dataset obtained from symbolic execution of Python code [15]; TermEq, a benchmark set consisting of the verification of term equivalences over strings [14]; and Slog, a benchmark set extracted from vulnerability testing of web applications [22]. The Slog set uses the replace function extensively but does not contain other extended functions. We also evaluate the impact on Aplas, a set of handcrafted benchmarks involving looping word equations [10] (string equalities whose left and right sides have variables in common).

We compare cvc<sup>4</sup> with <sup>z</sup><sup>3</sup> commit 9cb1a0f [8],<sup>4</sup> a state-of-the-art string solver. Additionally, we compare against Ostrich on the Slog benchmarks but not other sets because it does not support some functions such as contains and

<sup>4</sup> 9cb1a0f is newer than the current release 4.8.4 and includes several fixes for critical issues.

indexof. We omit a comparison with z3str<sup>3</sup> 4.8.4 because we found multiple issues in its latest release including wrong answers, which we have reported to the authors. We also omit a comparison with s3# due to differing semantics. We compare four configurations of cvc4: **all**, which enables all optimizations; **-arith**, which disables arithmetic-based simplification techniques (discussed in Sect. 3); **-contain**, which disables containment-based simplification techniques (discussed in Sect. 4); and **-msets**, which disables multiset-based simplification techniques (discussed in Sect. 5). Additionally, to test the applicability of our techniques to other solvers, we test the effect of our simplifications on z<sup>3</sup> by using cvc<sup>4</sup> to generate simplified benchmarks and then running z<sup>3</sup> on those benchmarks. We generate a set of simplified benchmarks that are simplified with cvc<sup>4</sup> with (z3<sup>f</sup> ) and without (z3b) the simplification techniques presented in this paper.

**Table 1.** Number of solved problems per benchmark set. Best results are in **bold**. Gray cells indicate benchmark sets not supported by a solver. "R%" indicates the reduction of extended string functions during preprocessing. All benchmarks ran with a timeout of 600 s.


We ran all benchmarks on a cluster equipped with Intel E5-2637 v4 CPUs running Ubuntu 16.04 and dedicated one core, 8 GB RAM, and 600 s for each job. Table 1 summarizes the number of solved instances for each configuration and the baseline solvers grouped by benchmark sets. We remark that the average reduction of extended string functions (with all simplification techniques enabled) shown in column "R%" is significant on all benchmark sets. The scatter plots in Fig. 9 detail the effects of disabling each family of simplifications. They distinguish between satisfiable and unsatisfiable instances. To emphasize

**Fig. 9.** Scatter plots showing the impact of disabling simplification techniques in cvc<sup>4</sup> on both satisfiable and unsatisfiable benchmarks. All benchmarks ran with a timeout of 600 s.

non-trivial benchmarks, we omit the benchmarks that are solved in less than a second by all solvers.

The arithmetic-based simplification techniques have the most significant performance impact on the symbolic execution benchmarks CMU. The number of solved benchmarks is significantly lower when disabling those techniques. The scatter plot shows that for longer running satisfiable queries there is a large portion of the benchmarks that are solved up to an order of magnitude faster with the simplifications. These improvements in runtime on the CMU set are particularly compelling because they come from a symbolic execution application, which involves a large number of queries with a short timeout. The improvements are more pronounced for unsatisfiable benchmarks, where our results show that simplifications often give the solver the ability to derive a refutation in a matter of seconds, something that is infeasible with configurations without these techniques. The Aplas set contains no extended string operators and hence our arithmetic-based simplification techniques have little impact on this set.

In contrast, both containment and multiset-based rewrites have a high impact on the Aplas set, as **-contain** and **-msets** both solve 121 fewer benchmarks. Additionally, **-contain** has a high impact on the TermEq set, where the simplifications enable the best configuration to solve 61 out of 80 benchmarks. Since these techniques apply most frequently to looping word equations, they are less important for the CMU set, which does not have such equations. The containment-based and multiset-based techniques primarily help on unsatisfiable benchmarks, as shown in the scatter plots. On TermEq benchmarks, it tends to be easier to find counterexamples, i.e. to solve the satisfiable ones, so there is more to gain on unsatisfiable benchmarks.

On Slog, Ostrich solves two more instances than cvc<sup>4</sup> but cvc<sup>4</sup> is over 50 times faster on commonly solved instances while supporting a richer set of string operators. On all benchmark sets, cvc<sup>4</sup> solves at least as many benchmarks as z<sup>3</sup> and cvc<sup>4</sup> has 12<sup>×</sup> fewer timeouts than z3. On the simplified benchmarks, z<sup>3</sup> performs significantly better. On the CMU and the Aplas benchmarks, <sup>z</sup>3<sup>b</sup> outperforms z3 by a large margin. Additionally simplifying the benchmarks with the techniques presented in this paper improves performance further on most benchmark sets and allows <sup>z</sup>3<sup>f</sup> to solve the most unsatisfiable benchmarks overall. These results indicate that z<sup>3</sup> could benefit from additional simplifications, and they underscore the importance of curating and publishing simplification techniques in order to improve the state-of-the-art.

## **8 Conclusion**

We have presented a set of aggressive simplification techniques for reasoning about extended string constraints. Our results suggest that such techniques are key to advancing the state of the art in SMT string solving. Arithmetic-based simplifications lead to significant speedups in benchmarks from a symbolic execution application, while containment and multiset-based simplifications improve the performance on problems consisting of difficult term equivalences and looping word equations. Our approach is not limited to cvc<sup>4</sup> and can be adapted to other solvers.

Given the encouraging results for each of the simplification techniques in our evaluation, we plan to extend them to other types of abstraction and make them context-aware. The latter extension involves taking into account other assertions when checking whether a side condition of a rule is fulfilled.

**Acknowledgements.** This work was partially supported by the National Science Foundation under award 1656926, the Defense Advanced Research Projects Agency under award FA8650-18-2-7854, and Amazon Web Services.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Alternating Automata Modulo First Order Theories**

Radu Iosif(B) and Xiao Xu

CNRS, Verimag, Universit´e de Grenoble Alpes, Grenoble, France Radu.Iosif@univ-grenoble-alpes.fr, Xiao.Xu@univ-grenoble-alpes.fr

**Abstract.** We introduce first-order alternating automata, a generalization of boolean alternating automata, in which transition rules are described by multisorted first-order formulae, with states and internal variables given by uninterpreted predicate terms. The model is closed under union, intersection and complement, and its emptiness problem is undecidable, even for the simplest data theory of equality. To cope with the undecidability problem, we develop an abstraction refinement semi-algorithm based on lazy annotation of the symbolic execution paths with interpolants, obtained by applying (i) quantifier elimination with witness term generation and (ii) Lyndon interpolation in the quantifierfree theory of the data domain, with uninterpreted predicate symbols. This provides a method for checking inclusion of timed and finite-memory register automata, and emptiness of quantified predicate automata, previously used in the verification of parameterized concurrent programs, composed of replicated threads, with shared memory.

#### **1 Introduction**

Many results in automata theory rely on the finite alphabet hypothesis, which guarantees, in some cases, the existence of determinization, complementation and inclusion checking methods. However, this hypothesis prevents the use of automata as models of real-time systems or even simple programs, whose input and output are data values ranging over very large domains, typically viewed as infinite mathematical abstractions.

Traditional attempts to generalize classical Rabin-Scott automata to infinite alphabets, such as timed automata [1] and finite-memory automata [16] face the *complement closure* problem: there exist automata for which the complement language cannot be recognized by an automaton in the same class. This makes it impossible to encode a language inclusion problem <sup>L</sup>(A) <sup>⊆</sup> <sup>L</sup>(B) as the emptiness of an automaton recognizing the language <sup>L</sup>(A) <sup>∩</sup> <sup>L</sup><sup>c</sup>(B), where <sup>L</sup><sup>c</sup>(B) denotes the complement of <sup>L</sup>(B).

Even for finite alphabets, complementation of finite-state automata faces inherent exponential blowup, due to nondeterminism. However, if we allow universal nondeterminism, in addition to the classical existential nondeterminism, complementation is possible is linear time. Having both existential and universal nondeterminism defines the *alternating automata* model [4]. A finite-alphabet alternating automaton is described by a set of transition rules q <sup>a</sup> −→ <sup>φ</sup>, where <sup>q</sup> is a state, a is an input symbol and φ is a boolean formula, whose propositional variables denote successor states.

*Our Contribution.* We extend alternating automata to infinite data alphabets, by defining a model of computation in which all boolean operations, including complementation, can be done in linear time. The control states are given by kary predicate symbols q(y1,...,yk), the input consists of an event a from a finite alphabet and a tuple of data variables x1,...,xn, ranging over an infinite domain, and transitions are of the form q(y1,...,yk) <sup>a</sup>(x1,...,xn) −−−−−−−→ <sup>φ</sup>(x1,...,xn, y1,...,yk), where φ is a formula in the first-order theory of the data domain. In this model, the arguments of a predicate atom q(y1,...,yk) represent the values of the *internal variables* associated with the state. Together with the input values x1,...,xn, these values define the next configurations, but remain invisible in the input sequence.

The tight coupling of internal values and control states, by means of uninterpreted predicate symbols, allows for linear-time complementation just as in the case of classical propositional alternating automata. Complementation is, moreover, possible when the transition formulae contain first-order quantifiers, generating infinitely-branching execution trees. The price to be paid for this expressivity is that emptiness of first-order alternating automata is undecidable, even for the simplest data theory of equality [6].

The main contribution of this paper is an effective emptiness checking semialgorithm for first-order alternating automata, in the spirit of the IMPACT lazy annotation procedure, originally developed for checking safety of nondeterministic integer programs [20,21]. In a nutshell, a lazy annotation procedure unfolds an automaton <sup>A</sup> trying to find an execution that recognizes a word from <sup>L</sup>(A). If a path that reaches a final state does not correspond to a concrete run of the automaton, the positions on the path are labeled with interpolants from the proof of infeasibility, thus marking this path and all continuations as infeasible for future searches. Termination of lazy annotation procedures is not guaranteed, but having a suitable coverage relation between the nodes of the search tree may ensure convergence of many real-life examples. However, applying lazy annotation to first-order alternating automata faces two nontrivial problems:


with conjunctions of existentially quantified interpolants combining predicate atoms with data constraints.

We use first-order alternating automata to develop practical semi-algorithms for a number of known undecidable problems, such as: inclusion of regular timed languages [1], inclusion of quasi-regular languages recognized by finite-memory automata [16] and emptiness of predicate automata, a subclass of first-order alternating automata used to verify parameterized concurrent programs [6,7]. *Related Work.* Recognizers for languages over infinite alphabets have found various applications, ranging from Unicode text recognition [5] to runtime program monitoring [2]. Extending finite automata to infinite alphabets has been considered in the context of *symbolic alternating finite automata* (s-AFA), whose transitions are labeled with guards taken from a decidable theory of the data domain [5]. As in our model, s-AFA are closed under union, intersection and complement and emptiness is decidable, due to the lack of registers. However, s-AFA are strictly less expressive than our model, because comparing data at different positions in the input word is not possible.

*Constrained Horn clauses* (CHC) are a branching computation model widespread in program verification [9]. The main difference between alternating and bottom-up branching computations is that, in an alternating model, all branches of the computation must synchronize on the same input word. With this in mind, it is possible to express emptiness of first-order alternating automata as the existence of solutions of a CHC over a higher-order theory of data, extended with algebraic data types (lists). The effectiveness of such an encoding depends on the effectiveness of interpolation and witness term generation for theories of algebraic data types [11].

The alternating automata model presented in this paper extends the alternating automata with variables ranging over infinite data considered in [14]. There all variables were required to be observable in the input. We overcome this restriction by allowing internal (invisible) variables. Another closely related work [13] considers an inclusion between an asynchronous product of automata <sup>A</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>A</sup>n, extended with data variables, and a monitor automaton <sup>B</sup>. The semi-algorithm defined there was based on the assumption that all variables of the observer B must be declared in the automata A1,...,A<sup>n</sup> under check. This limitation can now be bypassed, since the inclusion problem can be encoded as emptiness of a first-order alternating automaton and, moreover, the emptiness checking semi-algorithm can handle invisible variables.

The work probably closest to ours concerns the model of *predicate automata* (PA) [6,7,17], used in the verification of parameterized concurrent programs with shared memory. In this model, the alphabet consists of pairs of program statements and thread identifiers and is considered infinite because the number of threads is unbounded. Since thread identifiers can only be compared for equality, the data theory in PA is the theory of equality. Even with this simplification, the emptiness problem is undecidable when either the predicates have arity greater than one [6] or use quantified transition rules [17]. Checking emptiness of quantifier-free PA is possible semi-algorithmically, by explicitly enumerating reachable configurations and checking coverage by looking for permutations of argument values. However, no semi-algorithm has been given for quantified PA. Dealing with quantified transition rules is one of our contributions.

#### **1.1 Preliminaries**

For two integers 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>j</sup>, we define [i, j] def <sup>=</sup> {i, . . . , j} and [i] def = [0, i]. We consider two disjoint sorts <sup>D</sup> and <sup>B</sup>, where <sup>D</sup> is an infinite domain and <sup>B</sup> <sup>=</sup> {, ⊥} is the set of boolean values true () and false (⊥), respectively. The <sup>D</sup> sort is equipped with countably many function symbols <sup>f</sup> : <sup>D</sup>#(f) <sup>→</sup> <sup>D</sup> <sup>∪</sup> <sup>B</sup>, where #(f) <sup>≥</sup> <sup>0</sup> denotes the number of arguments (arity) of f. A *predicate* is a function symbol <sup>p</sup> : <sup>D</sup>#(p) <sup>→</sup> <sup>B</sup> that is, a #(p)-ary relation.

We consider the interpretation of all function symbols <sup>f</sup> : <sup>D</sup>#(f) <sup>→</sup> <sup>D</sup> to be fixed by the interpretation of the D sort, for instance if D is the set of integers Z, these are zero, the successor function and the arithmetic operations of addition and multiplication. We extend this convention to several predicates over D, such as the inequality relation over Z, and write Pred for the set of remaining *uninterpreted predicates*.

Let Var <sup>=</sup> {x, y, z, . . .} be a countably infinite set of variables, ranging over D. Terms are either constants of sort D, variables or function applications f(t1,...,t#(f)), where t1,...,t#(f) are terms. The set of first-order formulae is defined by the syntax below:

$$\phi := t = s \mid p(t\_1, \dots, t\_{\#(p)}) \mid \neg \phi\_1 \mid \phi\_1 \land \phi\_2 \mid \exists x \; . \; \phi\_1$$

where t, s, t1,...,t#(p) denote terms and <sup>p</sup> is a predicate symbol. We write <sup>φ</sup><sup>1</sup> <sup>∨</sup> <sup>φ</sup>2, <sup>φ</sup><sup>1</sup> <sup>→</sup> <sup>φ</sup><sup>2</sup> and <sup>∀</sup>x.φ<sup>1</sup> for <sup>¬</sup>(¬φ<sup>1</sup> ∧¬φ2), <sup>¬</sup>φ<sup>1</sup> <sup>∨</sup>φ<sup>2</sup> and ¬∃x . <sup>¬</sup>φ1, respectively. FV(φ) is the set of free variables in <sup>φ</sup> and the size <sup>|</sup>φ<sup>|</sup> of a formula <sup>φ</sup> is the number of symbols needed to write it down. A *sentence* is a formula φ with no free variables. A formula is *positive* if each uninterpreted predicate symbol occurs under an even number of negations and we denote by Form<sup>+</sup>(Q, X) the set of positive formulae with predicates from the set <sup>Q</sup> <sup>⊆</sup> Pred and free variables from the set <sup>X</sup> <sup>⊆</sup> Var. A formula is in *prenex form* if it is of the form <sup>ϕ</sup> <sup>=</sup> Q1x<sup>1</sup> ...Qnx<sup>n</sup> . φ, where φ has no quantifiers. In this case we call φ the *matrix* of ϕ. Every first-order formula can be written in prenex form, by renaming each quantified variable to a unique name and moving the quantifiers upfront.

An *interpretation* <sup>I</sup> maps each predicate symbol <sup>p</sup> into a set <sup>p</sup><sup>I</sup> <sup>⊆</sup> <sup>D</sup>#(p) , if #(p) > 0, or into an element of B if #(p) = 0. A *valuation* ν maps each variable x into an element of D. Given a term t, we denote by t <sup>ν</sup> the value obtained by replacing each variable x by the value ν(x) and evaluating each function application. For a formula <sup>φ</sup>, we define the forcing relation <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>φ</sup> recursively on the structure of φ, as usual. For a formula φ and a valuation ν, we define [[φ]]<sup>ν</sup> def <sup>=</sup> {<sup>I</sup> <sup>|</sup> <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>φ</sup>} and drop the <sup>ν</sup> subscript for sentences. A sentence <sup>φ</sup> is *satisfiable* if [[φ]] <sup>=</sup> <sup>∅</sup>. An element of [[φ]] is called a *model* of <sup>φ</sup>. A formula <sup>φ</sup> is *valid* if <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>φ</sup> for every interpretation <sup>I</sup> and every valuation <sup>ν</sup>. We say that <sup>φ</sup> *entails* <sup>ψ</sup>, written <sup>φ</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup> if and only if [[φ]] <sup>⊆</sup> [[ψ]].

Interpretations are partially ordered by the pointwise subset order, defined as <sup>I</sup><sup>1</sup> <sup>⊆</sup> <sup>I</sup><sup>2</sup> if and only if <sup>p</sup>I<sup>1</sup> <sup>⊆</sup> <sup>p</sup>I<sup>2</sup> for each predicate symbol <sup>p</sup> <sup>∈</sup> Pred. Given a formula φ and a valuation ν, we define [[φ]]<sup>μ</sup> ν def <sup>=</sup> {<sup>I</sup> <sup>|</sup> <sup>I</sup>, ν <sup>|</sup><sup>=</sup> φ, <sup>∀</sup>I <sup>⊆</sup> <sup>I</sup> . <sup>I</sup> , ν |<sup>=</sup> <sup>φ</sup>} the set of minimal interpretations that, together with ν, form models of φ.

#### **2 First Order Alternating Automata**

Let Σ be a finite alphabet Σ of *input events*. Given a finite set of variables <sup>X</sup> <sup>⊆</sup> Var, we denote by <sup>X</sup> → <sup>D</sup> the set of valuations of the variables <sup>X</sup> and <sup>Σ</sup>[X] = <sup>Σ</sup> <sup>×</sup>(<sup>X</sup> → <sup>D</sup>) be the possibly infinite set of *data symbols* (a, ν), where <sup>a</sup> is an input symbol and ν is a valuation. A *data word* (simply called word in the following) is a finite sequence w = (a1, ν1)(a2, ν2)...(an, νn) of data symbols. Given a word w, we denote by w<sup>Σ</sup> def = a<sup>1</sup> ...a<sup>n</sup> its sequence of input events and by w<sup>D</sup> the valuation associating each time-stamped variable x(i) , where <sup>x</sup> <sup>∈</sup> Var, the value <sup>ν</sup>i(x), for all <sup>i</sup> <sup>∈</sup> [1, n]. We denote by <sup>ε</sup> the empty sequence, by <sup>Σ</sup><sup>∗</sup> the set of finite input sequences and by Σ[X] <sup>∗</sup> the set of finite data words over the variables X.

<sup>A</sup> *first-order alternating automaton* is a tuple <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ, where <sup>Σ</sup> is a finite set of input events, X is a finite set of input variables, Q is a finite set of predicates denoting control states, <sup>ι</sup> <sup>∈</sup> Form<sup>+</sup>(Q, <sup>∅</sup>) is a sentence defining initial configurations, <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> is the set of predicates denoting final states and <sup>Δ</sup> is a set of *transition rules*. A transition rule is of the form q(y1,...,y#(q)) <sup>a</sup>(X) −−−→ <sup>ψ</sup>, where <sup>q</sup> <sup>∈</sup> <sup>Q</sup>is a predicate, <sup>a</sup> <sup>∈</sup> <sup>Σ</sup> is an input event and<sup>ψ</sup> <sup>∈</sup> Form<sup>+</sup>(Q, X∪{y1,...,y#(q)}) is a positive formula, where <sup>X</sup> ∩ {y1,...,y#(q)} <sup>=</sup> <sup>∅</sup>. Without loss of generality, we consider, for each predicate <sup>q</sup> <sup>∈</sup> <sup>Q</sup> and each input event <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, at most one such rule, as two or more rules can be joined using disjunction. The quantifiers occurring in the right-hand side formula of a transition rule are called *transition quantifiers*. The *size* of A is |A| def <sup>=</sup> <sup>|</sup>ι<sup>|</sup> <sup>+</sup> -{|ψ| | <sup>q</sup>(**y**) <sup>a</sup>(X) −−−→ <sup>ψ</sup> <sup>∈</sup> <sup>Δ</sup>}.

The semantics of first-order alternating automata is analogous to the semantics of propositional alternating automata, with rules of the form q <sup>a</sup> −→ <sup>φ</sup>, where q is a propositional variable and φ a positive boolean combination of propositional variables. For instance, q<sup>0</sup> a −→ (q<sup>1</sup> <sup>∧</sup>q2)∨q<sup>3</sup> means that the automaton can choose to transition in either both q<sup>1</sup> and q<sup>2</sup> or in q<sup>3</sup> alone. This leads to defining transitions as the *minimal models* of the right hand side of a rule<sup>1</sup>. The original definition of alternating automata [4] works around this problem and considers boolean valuations instead of formulae. In contrast, a finite description of a firstorder alternating automaton cannot be given in terms of interpretations, as a first-order formula may have infinitely many models, corresponding to infinitely many initial or successor states occurring within an execution step.

Given an uninterpreted predicate symbol <sup>q</sup> <sup>∈</sup> <sup>Q</sup> and data values <sup>d</sup>1,...,d#(q) <sup>∈</sup> <sup>D</sup>, the tuple (q, d1,...,d#(q)) is called a *configuration*, sometimes written q(d1,...,d#(q)), when no confusion arises. A configuration is

<sup>1</sup> Both {q<sup>1</sup> ← , q<sup>2</sup> ← , q<sup>3</sup> ← ⊥} and {q<sup>1</sup> ← ⊥, q<sup>2</sup> ← ⊥, q<sup>3</sup> ← } are minimal models, however {q<sup>1</sup> ← , q<sup>2</sup> ← , q<sup>3</sup> ← } is a model but is not minimal.

*final* if <sup>q</sup> <sup>∈</sup> <sup>F</sup>. An interpretation <sup>I</sup> corresponds to a set of configurations <sup>c</sup>(I) def <sup>=</sup> {(q, d1,...,d#(q)) <sup>|</sup> <sup>q</sup> <sup>∈</sup> Q, (d1,...,d#(q)) <sup>∈</sup> <sup>q</sup>I}, called a *cube*. This notation is lifted to sets of configurations in the usual way.

**Definition 1.** *Given a word* <sup>w</sup> = (a1, ν1)...(an, νn) <sup>∈</sup> <sup>Σ</sup>[X] <sup>∗</sup> *and a cube* c*, an* execution *of* <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ *over* <sup>w</sup>*, starting with* <sup>c</sup>*, is a forest* <sup>T</sup> <sup>=</sup> {T1, T2,...}*, where each* <sup>T</sup><sup>i</sup> *is a tree labeled with configurations, such that:*


An execution <sup>T</sup> over <sup>w</sup>, starting with <sup>c</sup>, is *accepting* if and only if all paths in <sup>T</sup> have the same length and the frontier of each tree <sup>T</sup> <sup>∈</sup> <sup>T</sup> is labeled with final configurations. If <sup>A</sup> has an accepting execution over <sup>w</sup> starting with a cube <sup>c</sup> <sup>∈</sup> <sup>c</sup>([[ι]]<sup>μ</sup>), then <sup>A</sup> *accepts* <sup>w</sup> and let <sup>L</sup>(A) be the set of words accepted by <sup>A</sup>. For example, consider the automaton <sup>A</sup> <sup>=</sup> {a}, {x}, {q0, q1, q2, q<sup>f</sup> }, q0(0), {q<sup>f</sup> }, Δ, where Δ is the set: q0(y) <sup>a</sup>(x) −−→ <sup>q</sup>1(<sup>y</sup> <sup>+</sup> <sup>x</sup>) <sup>∧</sup> <sup>q</sup>2(<sup>y</sup> <sup>−</sup> <sup>x</sup>), <sup>q</sup>1(y) <sup>a</sup>(x) −−→ <sup>q</sup>1(<sup>y</sup> <sup>+</sup> <sup>x</sup>) <sup>∨</sup> (y > <sup>0</sup> <sup>∧</sup> <sup>q</sup><sup>f</sup> ) and <sup>q</sup>2(y) <sup>a</sup>(x) −−→ <sup>q</sup>2(<sup>y</sup> <sup>−</sup> <sup>x</sup>) <sup>∨</sup> (y > <sup>0</sup> <sup>∧</sup> <sup>q</sup><sup>f</sup> ). A possible execution tree of this automaton is the following:

The execution tree is not accepting, since its frontier is not labeled with final configurations everywhere. Incidentally, here we have L(A) = ∅, which is proved by our tool in <sup>∼</sup>0.5 s on an average machine.

In the rest of this paper, we are concerned with the following problems:


For technical reasons, we address the following problem next: given an automaton <sup>A</sup> and an input sequence <sup>α</sup> <sup>∈</sup> <sup>Σ</sup>∗, does there exists a word <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A) such that w<sup>Σ</sup> = α ? By solving this problem first, we develop the machinery required to prove that first-order alternating automata are closed under complement and, further, set up the ground for developping a practical semi-algorithm for the emptiness problem.

#### **2.1 Path Formulae**

In the upcoming developments it is sometimes more convenient to work with logical formulae defining executions of automata, than with low-level execution forests. For this reason, we first introduce *path formulae* Θ(α), which are formulae defining the executions of an automaton, over words that share a given sequence α of input events. Second, we restrict a path formula Θ(α) to an *acceptance formula* Υ(α), which defines only those executions that are accepting among Θ(α). Consequently, the automaton accepts a word w such that w<sup>Σ</sup> = α if and only if Υ(α) is satisfiable.

Let <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ be an automaton for the rest of this section. For any <sup>i</sup> <sup>∈</sup> <sup>N</sup>, we denote by <sup>Q</sup>(i) <sup>=</sup> {q(i) <sup>|</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup>} and <sup>X</sup>(i) <sup>=</sup> {x(i) <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>X</sup>} the sets of time-stamped predicate symbols and variables, respectively. We also define Q(≤n) def <sup>=</sup> {q(i) <sup>|</sup> <sup>q</sup> <sup>∈</sup> Q, i <sup>∈</sup> [n]} and <sup>X</sup>(≤n) def <sup>=</sup> {x(i) <sup>|</sup> <sup>x</sup> <sup>∈</sup> X, i <sup>∈</sup> [n]}. For a formula <sup>ψ</sup> and <sup>i</sup> <sup>∈</sup> <sup>N</sup>, we define <sup>ψ</sup>(i) def = ψ[X(i) /X, Q(i) /Q] the formula in which all input variables and state predicates (and only those symbols) are replaced by their time-stamped counterparts. Moreover, we write q(**y**) for q(y1,...,y#(q)), when no confusion arises.

Given a sequence of input events <sup>α</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...a<sup>n</sup> <sup>∈</sup> <sup>Σ</sup>∗, the *path formula* of <sup>α</sup> is:

$$\Theta(\alpha) \stackrel{\text{def}}{=} \iota^{(0)} \wedge \bigwedge\_{i=1}^{n} \bigwedge\_{q(\mathbf{y}) \xrightarrow{a\_{i}(X)} \psi \in \Delta} \forall y\_{1} \dots \forall y\_{\#(q)} \; : \; q^{(i-1)}(\mathbf{y}) \to \psi^{(i)} \tag{1}$$

The automaton <sup>A</sup>, to which <sup>Θ</sup>(α) refers, will always be clear from the context. To formalize the relation between the low-level configuration-based execution semantics and path formulae, consider a word <sup>w</sup> = (a1, ν1)...(an, νn) <sup>∈</sup> <sup>Σ</sup>[X] ∗. Any execution <sup>T</sup> of <sup>A</sup> over <sup>w</sup> has an associated interpretation IT of timestamped predicates Q(≤n) :

IT (*q*(i) ) def <sup>=</sup> {(*d*1*,...,d*#(*q*)) <sup>|</sup> (*q, d*1*,...,d*#(*q*)) labels a node on level *<sup>i</sup>* in <sup>T</sup>}*,* <sup>∀</sup>*<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>* <sup>∀</sup>*<sup>i</sup>* <sup>∈</sup> [*n*]

**Lemma 1.** *Given an automaton* <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ*, for any word* <sup>w</sup> = (a1, ν1)...(an, νn)*, we have* [[Θ(wΣ)]]<sup>μ</sup> <sup>w</sup><sup>D</sup> <sup>=</sup> {IT <sup>|</sup> <sup>T</sup> *is an execution of* <sup>A</sup> *over* <sup>w</sup>}*.*

Next, we give a logical characterization of acceptance, relative to a given sequence of input events <sup>α</sup> <sup>∈</sup> <sup>Σ</sup>∗. To this end, we constrain the path formula <sup>Θ</sup>(α) by requiring that only final states of <sup>A</sup> occur on the last level of the execution. The result is the *acceptance formula* for α:

$$\mathcal{T}(\alpha) \stackrel{\text{def}}{=} \Theta(\alpha) \land \bigwedge\_{q \in Q\backslash F} \forall y\_1 \dots \forall y\_{\#\{q\}} \, \, q^{\langle n \rangle}(\mathbf{y}) \to \bot \tag{2}$$

The top-level universal quantifiers from a subformula <sup>∀</sup>y<sup>1</sup> ... <sup>∀</sup>y#(q) . q(i) (**y**) <sup>→</sup> <sup>ψ</sup> of Υ(α) will be referred to as *path quantifiers*, in the following. Notice that path quantifiers are distinct from the transition quantifiers that occur within a formula ψ of a transition rule q(y1,...,y#(q)) <sup>a</sup>(X) −−−→ <sup>ψ</sup> of <sup>A</sup>. The relation between the words accepted by A and the acceptance formula above, is formally captured by the following lemma:

**Lemma 2.** *Given an automaton* <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ*, for every word* <sup>w</sup> <sup>∈</sup> Σ[X] <sup>∗</sup>*, the following are equivalent: (1) there exists an interpretation* I *such that* <sup>I</sup>, w<sup>D</sup> <sup>|</sup><sup>=</sup> <sup>Υ</sup>(wΣ) *and (2)* <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A)*.*

As an immediate consequence, one can decide whether A accepts some word w with a given input sequence w<sup>Σ</sup> = α, by checking whether Υ(α) is satisfiable. However, unlike non-alternating infinite-state models of computation, such as counter automata (nondeterministic programs with integer variables), the satisfiability query for an acceptance (path) formula falls outside of known decidable theories, supported by standard SMT solvers. There are basically two reasons for this, namely (i) the presence of predicate symbols, and (ii) the non-trivial alternation of quantifiers. To understand this point, consider for example, the decidable theory of Presburger arithmetic [24]. Adding even only one monadic predicate symbol to it yields undecidability in the presence of non-trivial quantifier alternation [10]. On the other hand, the quantifier-free fragment of Presburger arithmetic extended with uninterpreted function symbols is decidable, by a Nelson-Oppen style congruence closure argument [22].

To tackle the problem of deciding satisfiability of Υ(α) formulae, we start from the observation that their form is rather particular, which allows the elimination of path quantifiers and uninterpreted predicate symbols, by a couple of satisfiability-preserving transformations. The result of applying these transformations is a formula with no predicate symbols, whose only quantifiers are those introduced by the transition rules of the automaton. Next, in Sect. 3 we shall assume moreover that the first-order theory of the data sort D (without uninterpreted predicate symbols) has quantifier elimination, providing thus an effective decision procedure.

For the time being, let us formally define the elimination of transition quantifiers and predicate symbols. Let α = a<sup>1</sup> ...a<sup>n</sup> be a given sequence of input events and let <sup>α</sup><sup>i</sup> be the prefix <sup>a</sup><sup>1</sup> ...a<sup>i</sup> of <sup>α</sup>, for <sup>i</sup> <sup>∈</sup> [n], where <sup>α</sup><sup>0</sup> <sup>=</sup> . We consider the sequence of formulae <sup>Θ</sup>(α0),..., <sup>Θ</sup>(αn) defined as <sup>Θ</sup>(α0) def = ι (0) and, for all <sup>i</sup> <sup>∈</sup> [1, n], let <sup>Θ</sup>(αi) be the conjunction of <sup>Θ</sup>(α<sup>i</sup>−<sup>1</sup>) with all formulae <sup>q</sup>(i−1)(t1,...,t#(q)) <sup>→</sup> <sup>ψ</sup>(i) [t1/y1,...,t#(q)/y#(q)], such that q(i−1)(t1,...,t#(q)) occurs in <sup>Θ</sup>(α<sup>i</sup>−<sup>1</sup>), for some terms <sup>t</sup>1,...,t#(q). Next, we write <sup>Υ</sup>(α) for the conjunction of <sup>Θ</sup>(αn) with all <sup>q</sup>(n) (t1,...,t#(q)) → ⊥, such that <sup>q</sup>(n) (t1,...,t#(q)) occurs in <sup>Θ</sup>(αn), for some <sup>q</sup> <sup>∈</sup> <sup>Q</sup>\F. Note that <sup>Υ</sup>(α) contains no path quantifiers, as required. On the other hand, the scope of the transition quantifiers in <sup>Υ</sup>(α) exceeds the right-hand side formulae from the transition rules, as shown by the following example.

*Example 1.* Consider the automaton <sup>A</sup> <sup>=</sup> {a1, a2}, {x}, {q, q<sup>f</sup> }, ι, {q<sup>f</sup> }, Δ, where:

$$\begin{aligned} \iota = \exists z \; . \; z \ge 0 \land q(z) \\ \Delta = \{q(y) \xrightarrow{a\_1(x)} x \ge 0 \land \forall z \; . \; z \le y \to q(x+z), \; q(y) \xrightarrow{a\_2(x)} y < 0 \land q\_f(x+y)\} \end{aligned}$$

For the input event sequence α = a1a2, the acceptance formula is:

$$\begin{array}{l} \mathcal{T}(\alpha) = \exists z\_1 \, \, . \, z\_1 \ge 0 \land q^{(0)}(z\_1) \land \\ \quad \forall y \, . \, q^{(0)}(y) \to [x^{(1)} \ge 0 \land \forall z\_2 \, . \, z\_2 \ge y \to q^{(1)}(x^{(1)} + z\_2)] \land \\ \quad \forall y \, . \, q^{(1)}(y) \to [y < 0 \land q^{(2)}(x^{(2)} + y)] \end{array}$$

The result of eliminating the path quantifiers, in prenex normal form, is shown below:

$$\begin{array}{l} \hat{T}(\alpha) = \exists z\_1 \forall z\_2 \; . \; z\_1 \ge 0 \land q^{(0)}(z\_1) \land \\ \left[ q^{(0)}(z\_1) \to x^{(1)} \ge 0 \land \left( z\_2 \ge z\_1 \to q^{(1)}(x^{(1)} + z\_2) \right) \right] \land \\ \left[ q^{(1)}(x^{(1)} + z\_2) \to x^{(1)} + z\_2 < 0 \land q^{(2)}(x^{(2)} + x^{(1)} + z\_2) \right] \end{array}$$

Notice that the transition quantifiers <sup>∃</sup>z<sup>1</sup> and <sup>∀</sup>z<sup>2</sup> from <sup>Υ</sup>(α) range now over <sup>Υ</sup>(α). -

**Lemma 3.** *For any input event sequence* α = a<sup>1</sup> ...a<sup>n</sup> *and each valuation* ν : <sup>X</sup>(≤n) <sup>→</sup> <sup>D</sup>*, the following hold, for every interpretation* <sup>I</sup>*: (1) if* <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α) *then* <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α)*, and (2) if* <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α) *there exists an interpretation* <sup>J</sup> <sup>⊆</sup> <sup>I</sup> *such that* <sup>J</sup>, ν <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α)*.*

Further, we eliminate the predicate atoms from <sup>Υ</sup>(α), by considering the sequence of formulae Θ(α0) def = ι (0) and Θ(αi) is obtained by substituting each predicate atom <sup>q</sup>(i−1)(t1,...,t#(q)) in <sup>Θ</sup>(α<sup>i</sup>−<sup>1</sup>) by <sup>ψ</sup>(i) [t1/y1,...,t#(q)/y#(q)], where q(**y**) ai(X) −−−→ <sup>ψ</sup> <sup>∈</sup> <sup>Δ</sup>, for all <sup>i</sup> <sup>∈</sup> [1, n]. We write <sup>Υ</sup>(α) for the formula obtained by replacing, in Θ(α), each occurrence of a predicate q(n) , such that <sup>q</sup> <sup>∈</sup> <sup>Q</sup> \ <sup>F</sup> (resp. <sup>q</sup> <sup>∈</sup> <sup>F</sup>), by <sup>⊥</sup> (resp. ).

*Example 2* (*Contd. from Example* 1*).* The result of the elimination of predicate atoms from the acceptance formula in Example 1 is shown below:

$$\left| \overline{T}(\alpha) = \exists z\_1 \forall z\_2 \; . \; z\_1 \ge 0 \land \left[ x^{(1)} \ge 0 \land \left( z\_2 \ge z\_1 \to x^{(1)} + z\_2 < 0 \right) \right] \right|$$

Since this formula is unsatisfiable, by Lemma 5 below, no word w with input event sequence <sup>w</sup><sup>Σ</sup> <sup>=</sup> <sup>a</sup>1a<sup>2</sup> is accepted by the automaton <sup>A</sup> from Example 1. -

At this point, we prove the formal relation between the satisfiability of the formulae <sup>Υ</sup>(α) and <sup>Υ</sup>(α). Since there are no occurrences of predicates in <sup>Υ</sup>(α), for each valuation <sup>ν</sup> : <sup>X</sup>(≤n) <sup>→</sup> <sup>D</sup>, there exists an interpretation <sup>I</sup> such that <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α) if and only if <sup>J</sup>, ν <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α), for every interpretation <sup>J</sup>. In this case we omit <sup>I</sup> and simply write <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α).

**Lemma 4.** *For any input event sequence* α = a<sup>1</sup> ...a<sup>n</sup> *and each valuation* ν : <sup>X</sup>(≤n) <sup>→</sup> <sup>D</sup>*, there exists a valuation* <sup>I</sup> *such that* <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α) *if and only if* <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>Υ</sup>(α)*.*

Finally, we define the acceptance of a word with a given input event sequence by means of a quantifier-free formula in which no predicate atom occurs.

**Lemma 5.** *Given an automaton* <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ*, for every word* <sup>w</sup> <sup>∈</sup> Σ[X] <sup>∗</sup>*, we have* <sup>w</sup><sup>D</sup> <sup>|</sup><sup>=</sup> <sup>Υ</sup>(wΣ) *if and only if* <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A)*.*

#### **2.2 Boolean Closure of First Order Alternating Automata**

Given a positive formula φ, we define the *dual* formula φ<sup>∼</sup> recursively as follows:

$$\begin{array}{llll} \left(\phi\_{1}\vee\phi\_{2}\right)^{\sim} \stackrel{\text{def}}{=} \phi\_{1}\stackrel{\sim}{}\wedge\phi\_{2} & \qquad \left(\phi\_{1}\wedge\phi\_{2}\right)^{\sim} \stackrel{\text{def}}{=} \phi\_{1}\stackrel{\sim}{}\vee\phi\_{2} & \qquad \left(t=s\right)^{\sim} \stackrel{\text{def}}{=} t \neq s\\ \left(\exists x\,\,.\,\phi\_{1}\right)^{\sim} \stackrel{\text{def}}{=} \forall x\,\,.\,\phi\_{1} & \qquad \left(\forall x\,\,.\,\phi\_{1}\right)^{\sim} \stackrel{\text{def}}{=} \exists x\,\,.\,\phi\_{1} & \qquad \left(t\neq s\right)^{\sim} \stackrel{\text{def}}{=} t = s\\ & & \qquad q\left(x\_{1},\ldots,x\_{\#\left(q\right)}\right)^{\sim} \stackrel{\text{def}}{=} q\left(x\_{1},\ldots,x\_{\#\left(q\right)}\right) \end{array}$$

The following theorem shows closure of automata under all boolean operations. Note that it is sufficient to show closure under intersection and negation because <sup>L</sup>(A1) <sup>∪</sup> <sup>L</sup>(A2) is the complement of the language <sup>L</sup><sup>c</sup>(A1) <sup>∩</sup> <sup>L</sup><sup>c</sup>(A2), for any two automata A<sup>1</sup> and A<sup>2</sup> with the same input event alphabet and set of input variables.

**Theorem 1.** *Given automata* <sup>A</sup><sup>i</sup> <sup>=</sup> Σ, X, Qi, ιi, Fi, Δi*, for* <sup>i</sup> = 1, <sup>2</sup>*, such that* <sup>Q</sup><sup>1</sup> <sup>∩</sup> <sup>Q</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup>*, the following hold:*

*1.* <sup>L</sup>(A∩) = <sup>L</sup>(A1) <sup>∩</sup> <sup>L</sup>(A2)*, where* <sup>A</sup><sup>∩</sup> <sup>=</sup> Σ, X, Q<sup>1</sup> <sup>∪</sup> <sup>Q</sup>2, ι<sup>1</sup> <sup>∧</sup> <sup>ι</sup>2, F<sup>1</sup> <sup>∪</sup> <sup>F</sup>2, <sup>Δ</sup><sup>1</sup> <sup>∪</sup> <sup>Δ</sup>2*,*

$$\begin{array}{l} \text{2. } \mathcal{L}(\overline{\mathcal{R}\_{i}}) = \Sigma[X]^{\*} \mid \mathcal{L}(\mathcal{R}\_{i}), \text{ where } \overline{\mathcal{R}\_{i}} = \langle \Sigma, X, Q\_{i}, \iota^{\sim}, Q\_{i} \mid F\_{i}, \Delta\_{i}^{\sim} \rangle \text{ and } \Delta\_{i}^{\sim} = \langle q(\mathbf{y}) \xrightarrow{a(X)} \psi^{\sim} \mid q(\mathbf{y}) \xrightarrow{a(X)} \psi \in \Delta\_{i} \rangle, \text{ for } i = 1, 2. \end{array}$$

*Moreover,* <sup>|</sup>A∩<sup>|</sup> <sup>=</sup> <sup>O</sup>(|A1<sup>|</sup> <sup>+</sup> <sup>|</sup>A2|) *and* <sup>|</sup>Ai<sup>|</sup> <sup>=</sup> <sup>O</sup>(|Ai|)*, for* <sup>i</sup> = 1, <sup>2</sup>*.*

#### **3 The Emptiness Problem**

The emptiness problem is undecidable even for automata with predicates of arity two, whose transition rules use only equalities and disequalities, having no transition quantifiers [6]. Since even such simple classes of alternating automata have no general decision procedure for emptiness, we use an abstraction-refinement semi-algorithm based on *lazy annotation* [20,21]. In a nutshell, a lazy annotation procedure systematically explores the set of finite input event sequences searching for an accepting execution. For an input sequence, if the path formula is satisfiable, we compute a word in the language of the automaton, from the model of the path formula. Otherwise, i.e. the sequence is *spurious*, the search backtracks and each position in the sequence is annotated with an interpolant, thus marking the sequence as infeasible. The semi-algorithm uses moreover a coverage relation between sequences, ensuring that the continuations of already covered sequences are never explored. Sometimes this coverage relation provides a sound termination argument, in case when the automaton is empty.

For two input event sequences α, β <sup>∈</sup> <sup>Σ</sup>∗, we say that <sup>α</sup> is a prefix of <sup>β</sup>, written <sup>α</sup> <sup>β</sup>, if <sup>α</sup> <sup>=</sup> βγ for some sequence <sup>γ</sup> <sup>∈</sup> <sup>Σ</sup>∗. A set <sup>S</sup> of sequences is *prefix-closed* if for each <sup>α</sup> <sup>∈</sup> <sup>S</sup>, if <sup>β</sup> <sup>α</sup> then <sup>β</sup> <sup>∈</sup> <sup>S</sup>, and *complete* if for each <sup>α</sup> <sup>∈</sup> <sup>S</sup>, there exists <sup>a</sup> <sup>∈</sup> <sup>Σ</sup> such that αa <sup>∈</sup> <sup>S</sup> if and only if αb <sup>∈</sup> <sup>S</sup> for all <sup>b</sup> <sup>∈</sup> <sup>Σ</sup>. A prefix-closed set is the backbone of a tree whose edges are labeled with input events. If the set is, moreover, complete, then every node of the tree has either zero successors, in which case it is called a *leaf*, or it has a successor edge labeled with <sup>a</sup> for each input event <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>.

**Definition 2.** *An* unfolding *of an automaton* <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ *is a finite partial mapping* <sup>U</sup> : <sup>Σ</sup><sup>∗</sup> *fin* Form<sup>+</sup>(Q, <sup>∅</sup>)*, whose domain* dom(U) *is a finite prefix-closed complete set, such that* U( ) = ι*, and for each sequence* αa <sup>∈</sup> dom(U)*, such that* <sup>α</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> *and* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>*:*

$$U(\alpha)^{(0)} \wedge \bigwedge\_{q(\mathbf{y}) \xrightarrow{a(X)} \psi} \forall y\_1 \dots \forall y\_{\#q} \,\,\, q^{(0)}(\mathbf{y}) \to \psi^{(1)} \,\, \vert = U(\alpha a)^{(1)}$$

*A path* <sup>α</sup> *is* safe *in* <sup>U</sup> *if and only if* <sup>U</sup>(α) <sup>∧</sup> <sup>q</sup>∈Q\<sup>F</sup> <sup>∀</sup>y<sup>1</sup> ... <sup>∀</sup>y#(q) . q(**y**) → ⊥ *is unsatisfiable. The unfolding* U *is safe if and only if every path in* dom(U) *is safe in* U*.*

Lazy annotation semi-algorithms [20,21] build unfoldings of automata trying to discover counterexamples for emptiness. If the automaton A in question is non-empty, a systematic enumeration of the input event sequences<sup>2</sup> from Σ<sup>∗</sup> will suffice to discover a word <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A), provided that the first-order theory of the data domain <sup>D</sup> is decidable (Lemma 2). However, if <sup>L</sup>(A) = <sup>∅</sup>, the enumeration of input event sequences may, in principle, run forever. The typical way of fighting this divergence problem is to define a *coverage* relation between the nodes of the unfolding tree.

**Definition 3.** *Given an unfolding* <sup>U</sup> *of an automaton* <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ *<sup>a</sup> node* <sup>α</sup> <sup>∈</sup> dom(U) *is* covered *by another node* <sup>β</sup> <sup>∈</sup> dom(U)*, denoted* <sup>α</sup> <sup>β</sup>*, if and only if there exists a node* <sup>α</sup> <sup>α</sup> *such that* <sup>U</sup>(α ) <sup>|</sup><sup>=</sup> <sup>U</sup>(β)*. Moreover,* <sup>U</sup> *is* closed *if and only if every leaf from* dom(U) *is covered by an uncovered node.*

A lazy annotation semi-algorithm will stop and report emptiness provided that it succeeds in building a closed and safe unfolding of the automaton. Notice that, by Definition 3, for any three nodes of an unfolding <sup>U</sup>, say α, β, γ <sup>∈</sup> dom(U), if <sup>α</sup> <sup>≺</sup> <sup>β</sup> and <sup>α</sup> <sup>γ</sup>, then <sup>β</sup> <sup>γ</sup> as well. As we show next (Theorem 2), there is no need to expand covered nodes, because, intuitively, there exists a word <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A) such that <sup>α</sup> <sup>w</sup><sup>Σ</sup> and <sup>α</sup> <sup>γ</sup> only if there exists another word <sup>u</sup> <sup>∈</sup> <sup>L</sup>(A) such that <sup>γ</sup> <sup>u</sup>Σ. Hence, exploring only those input event sequences that are continuations of γ (and ignoring those of α) suffices in order to find a counterexample for emptiness, if one exists.

An unfolding node <sup>α</sup> <sup>∈</sup> dom(U) is said to be *spurious* if and only if <sup>Υ</sup>(α) is unsatisfiable. In this case, we change (refine) the labels of (some of the) prefixes of <sup>α</sup> (and that of <sup>α</sup>), such that <sup>U</sup>(α) becomes <sup>⊥</sup>, thus indicating that there is no real execution of the automaton along that input event sequence. As a result of the change of labels, if a node <sup>γ</sup> <sup>α</sup> used to cover another node from dom(U), it might not cover it with the new label. Therefore, the coverage relation has to be recomputed after each refinement of the labeling. The semi-algorithm stops when (and if) a safe complete unfolding has been found.

**Theorem 2.** *If an automaton* A *has a nonempty safe closed unfolding then* <sup>L</sup>(A) = <sup>∅</sup>*.*

<sup>2</sup> For instance, using breadth-first search.

#### **Algorithm 1.** IMPACT-based Semi-algorithm for First Order Alternating Automata

**input**: a first order alternating automaton A = Σ, X, Q, ι, F, Δ **output**: if L(A) = ∅, or word w ∈ L(A), otherwise **data structures**: WorkList and unfolding tree <sup>U</sup> <sup>=</sup> N,E, <sup>r</sup>, U, ✁, where: – N is a set of nodes, – E ⊆ N × Σ × N is a set of edges labeled by input events, – <sup>U</sup> : <sup>N</sup> <sup>→</sup> Form<sup>+</sup>(Q, <sup>∅</sup>) is a labeling of nodes with positive sentences – ✁ ⊆ N × N is a coverage relation, **initially** WorkList <sup>=</sup> {r} and <sup>N</sup> <sup>=</sup> <sup>E</sup> <sup>=</sup> <sup>U</sup> <sup>=</sup> ✁ <sup>=</sup> <sup>∅</sup>. 1: **while** WorkList <sup>=</sup> <sup>∅</sup> **do** 2: dequeue n from WorkList 3: N ← N ∪ {n} 4: let α(n) be a1,...,a*<sup>k</sup>* 5: **if** Υ(α)(X(1),...,X(k) ) is satisfiable **then** counterexample is feasible 6: get model ν of Υ(α)(X(1),...,X(k) ) 7: **return** w = (a1, ν(X(1))) ... (a*k*, ν(X(k) )) w ∈ L(A) by construction 8: **else** spurious counterexample 9: let (I0,...,I*k*) be a GLI for α 10: b ← ⊥ 11: **for** i = 0,...,k **do** 12: **if** U(n*i*) |= I*<sup>i</sup>* **then** 13: *Uncover* ← {m ∈ N | (m, n*i*) ∈ ✁} 14: ✁ ← ✁ \ {(m, n*i*) | m ∈ *Uncover*} uncover the nodes covered by n*<sup>i</sup>* 15: **for** m ∈ *Uncover* such that m is a leaf of U **do** 16: enqueue m into WorkList reactivate uncovered leaves 17: U(n*i*) ← U(n*i*) ∧ J*<sup>i</sup>* strenghten the label of n*<sup>i</sup>* (Lemma 7) 18: **if** ¬b **then** 19: <sup>b</sup> <sup>←</sup> Close(n*i*) 20: **if** n is not covered **then** 21: **for** a ∈ Σ **do** expand n 22: let s be a fresh node and e = (n, a, s) be a new edge 23: E ← E ∪ {e} 24: U ← U ∪ {(s, )} 25: enqueue s into WorkList 26: **return** 27: **function** Close(x) **returns** <sup>B</sup> 28: **for** y ∈ N such that α(y) ≺<sup>∗</sup> α(x) **do** 29: **if** U(x) |= U(y) **then** 30: ✁ ← [✁ \ {(p, q) ∈ ✁ | q is x or a successor of x}] ∪ {(x, y)} 31: **return** 32: **return** ⊥

We describe the semi-algorithm used to check emptiness of first-order alternating automata. The execution of Algorithm 1 consists of three phases, corresponding to the Close, Refine and Expand of the original IMPACT procedure [20]. Let n be a node removed from the worklist at line 2 and let α(n) be the input sequence labeling the path from the root node to n. If Υ(α(n)) is satisfiable, the sequence α(n) is feasible, in which case a model of Υ(α(n)) is obtained and a word <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A) is returned. Otherwise, <sup>α</sup>(n) is an infeasible input sequence and the procedure enters the refinement phase (lines 9–19). The GLI for α(n) is used to strenghten the labels of all the ancestors of n, by conjoining the formulae of the interpolant, changed according to Lemma 7, to the existing labels.

In this process, the nodes on the path between r and n, including n, might become eligible for coverage, therefore we attempt to close each ancestor of n that is impacted by the refinement (line 19). Observe that, in this case the call to Close must uncover each node which is covered by a successor of <sup>n</sup> (line <sup>30</sup> of the Close function). This is required because, due to the over-approximation of the sets of reachable configurations, the covering relation is not transitive, as explained in [20]. If Close adds a covering edge (ni, m) to ✁, it does not have to be called for the successors of n<sup>i</sup> on this path, which is handled via the boolean flag b. Finally, if n is still uncovered (it has not been previously covered during the refinement phase) we expand n (lines 21–25) by creating a new node for each successor <sup>s</sup> via the input event <sup>a</sup> <sup>∈</sup> <sup>Σ</sup> and inserting it into the worklist.

#### **4 Interpolant Generation**

Typically, when checking the unreachability of a set of program configurations, the interpolants used to annotate the unfolded control structure are assertions about the values of the program variables in a given control state, at a certain step of an execution [20]. Because we consider alternating computation trees (forests), we must distinguish between (i) locality of interpolants w.r.t. a given control state (control locality) and (ii) locality w.r.t. a given time stamp (time locality). In logical terms, *control-local* interpolants are formulae involving a single predicate symbol, whereas *time-local* interpolants involve only predicates q(i) and variables x(i) , for a single <sup>i</sup> <sup>≥</sup> 0. When considering alternating executions, control-local interpolants are not always enough to prove emptiness, because of the synchronization of several branches of the computation on the same input word. For this reason, the interpolants considered in this paper will never be control-local and we shall use the term *local* to denote time-local interpolants, with no free variables.

First, let us give the formal definition of the class of interpolants we shall work with. Given a formula φ, the *vocabulary* of φ, denoted V(φ) is the set of predicate symbols <sup>q</sup> <sup>∈</sup> <sup>Q</sup>(i) and variables <sup>x</sup> <sup>∈</sup> <sup>X</sup>(i) , occurring in φ, for some <sup>i</sup> <sup>≥</sup> 0. For a term <sup>t</sup>, its vocabulary V(t) is the set of variables that occur in <sup>t</sup>. Observe that quantified variables and the interpreted function symbols of the data theory<sup>3</sup> do not belong to the vocabulary of a formula. By P<sup>+</sup>(φ) [P−(φ)] we denote the set of predicate symbols that occur in φ under an even [odd] number of negations.

<sup>3</sup> E.g., the arithmetic operators of addition and multiplication, when D is the set of integers.

**Definition 4 (**[19]**).** *Given formulae* <sup>φ</sup> *and* <sup>ψ</sup> *such that* <sup>φ</sup> <sup>∧</sup> <sup>ψ</sup> *is unsatisfiable, <sup>a</sup>* Lyndon interpolant *is a formula* <sup>I</sup> *such that* <sup>φ</sup> <sup>|</sup><sup>=</sup> <sup>I</sup>*, the formula* <sup>I</sup> <sup>∧</sup> <sup>ψ</sup> *is unsatisfiable,* V(I) <sup>⊆</sup> V(φ)∩V(ψ)*,* <sup>P</sup><sup>+</sup>(I) <sup>⊆</sup> <sup>P</sup><sup>+</sup>(φ)∩P<sup>+</sup>(ψ) *and* <sup>P</sup>−(I) <sup>⊆</sup> <sup>P</sup>−(φ)<sup>∩</sup> P−(ψ)*.*

In the rest of this section, fix an automaton <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ. The following definition generalizes interpolants from unsatisfiable conjunctions to input sequences:

**Definition 5.** *Given a sequence of input events* <sup>α</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...a<sup>n</sup> <sup>∈</sup> <sup>Σ</sup>∗*, a* generalized Lyndon interpolant (GLI) *is a sequence* (I0,...,In) *of formulae such that, for all* <sup>k</sup> <sup>∈</sup> [<sup>n</sup> <sup>−</sup> 1]*, the following hold: (1)* <sup>P</sup>−(Ik) = <sup>∅</sup>*, (2)* <sup>ι</sup> (0) |= <sup>I</sup>0*,* <sup>I</sup><sup>k</sup> <sup>∧</sup> q(**y**) ai(X) −−−→<sup>ψ</sup>∈<sup>Δ</sup> <sup>∀</sup>y<sup>1</sup> ... <sup>∀</sup>y#(q) . q(k) (**y**) <sup>→</sup> <sup>ψ</sup>(k+1) <sup>|</sup><sup>=</sup> <sup>I</sup>k+1 *and (3)* <sup>I</sup><sup>n</sup> <sup>∧</sup> <sup>q</sup>∈Q\<sup>F</sup> <sup>∀</sup>y<sup>1</sup> ... <sup>∀</sup>y#(q) . q(**y**) → ⊥ *is unsatisfiable. Moreover, the GLI is* local *if and only if* V(Ik) <sup>⊆</sup> <sup>Q</sup>(k) *, for all* <sup>k</sup> <sup>∈</sup> [n]*.*

The following proposition states the existence of local GLI for the theories in which Lyndon's Interpolation Theorem holds.

**Proposition 1.** *If there exists a Lyndon interpolant for any two formulae* φ *and* ψ*, in the first-order theory of data with uninterpreted predicate symbols, such that* <sup>φ</sup>∧<sup>ψ</sup> *is unsatisfiable, then any sequence of input events* <sup>α</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...a<sup>n</sup> <sup>∈</sup> <sup>Σ</sup>∗*, such that* Υ(α) *is unsatisfiable, has a local GLI* (I0,...,In)*.*

A problematic point of the above proposition is that the existence of Lyndon interpolants (Definition 4) is proved in principle, but the proof is nonconstructive. In other words, the proof of Proposition 1 does not yield an algorithm for computing GLIs, for the following reason. Building an interpolant for an unsatisfiable conjunction of formulae <sup>φ</sup> <sup>∧</sup> <sup>ψ</sup> is typically the job of the decision procedure that proves the unsatisfiability and, in general, there is no such procedure, when φ and ψ contain predicates and have non-trivial quantifier alternation. In this case, some provers use instantiation heuristics for the universal quantifiers that are sufficient for proving unsatisfiability, however these heuristics are not always suitable for interpolant generation. Consequently, from now on, we assume the existence of an effective Lyndon interpolation procedure only for decidable theories, such as the quantifier-free linear (integer) arithmetic with uninterpreted functions (UFLIA, UFLRA, etc.) [26].

This is where the predicate-free path formulae (defined in Sect. 2.1) come into play. Recall that, for a given event sequence <sup>α</sup>, the automaton <sup>A</sup> accepts a word w such that w<sup>Σ</sup> = α if and only if Υ(α) is satisfiable (Lemma 5). Assuming further that the equality and interpreted predicates (e.g. inequalities for integers) atoms from the transition rules of A belong to a decidable first-order theory, such as Presburger arithmetic, Lemma 5 gives us an effective way of checking emptiness of A, relative to a given event sequence. However, this method does not cope well with lazy annotation, because there is no way to extract, from the unsatisfiability proof of Υ(α), the interpolants needed to annotate α. This is because (I) the formula Υ(α), obtained by repeated substitutions loses track of the steps of the execution, and (II) quantifiers that occur nested in Υ(α) make it difficult to write Υ(α) as an unsatisfiable quantifier-free conjunction of formulae from which interpolants are extracted (Definition 4).

The solution we adopt for the first issue (I) consists in partially recovering the time-stamped structure of the acceptance formula Υ(α) using the formula <sup>Υ</sup>(α), in which only transition quantifiers occur. The second issue (II) is solved under the additional assuption that the theory of the data domain D has *witnessproducing quantifier elimination*. More precisely, we assume that, for each formula <sup>∃</sup>x.φ(x), there exists an effectively computable term <sup>τ</sup> , in which <sup>x</sup> does not occur, such that <sup>∃</sup>x.φ and <sup>φ</sup>[τ /x] are equisatisfiable. These terms, called *witness terms* in the following, are actual definitions of the Skolem function symbols from the following folklore theorem:

**Theorem 3 (**[3]**).** *Given* Q1x<sup>1</sup> ...Qnx<sup>n</sup> . φ *a first-order sentence, where* <sup>Q</sup>1,...,Q<sup>n</sup> ∈ {∃, ∀} *and* <sup>φ</sup> *is quantifier-free, let* <sup>η</sup><sup>i</sup> def <sup>=</sup> <sup>f</sup>i(y1,...,y<sup>k</sup><sup>i</sup> ) *if* <sup>Q</sup><sup>i</sup> <sup>=</sup> <sup>∀</sup> *and* ηi def <sup>=</sup> <sup>x</sup><sup>i</sup> *if* <sup>Q</sup><sup>i</sup> <sup>=</sup> <sup>∃</sup>*, where* <sup>f</sup><sup>i</sup> *is a fresh function symbol and* {y1,...,y<sup>k</sup><sup>i</sup> } <sup>=</sup> {x<sup>j</sup> <sup>|</sup> j < i, Q<sup>j</sup> <sup>=</sup> ∃}*. Then the entailment* <sup>Q</sup>1x<sup>1</sup> ...Qnx<sup>n</sup> . φ <sup>|</sup><sup>=</sup> <sup>φ</sup>[η1/x1,...,ηn/xn] *holds.*

Examples of witness-producing quantifier elimination procedures can be found in the literature for e.g. linear integer (real) arithmetic (LIA,LRA), Presburger arithmetic and boolean algebra of sets and Presburger cardinality constraints (BAPA) [18].

Under the assumption that witness terms can be effectively built, we describe the generation of a non-local GLI for a given input event sequence α = a<sup>1</sup> ...an. First, we generate successively the acceptance formula Υ(α) and its equisatisfiable forms <sup>Υ</sup>(α) = <sup>Q</sup>1x<sup>1</sup> ...Qmx<sup>m</sup> . <sup>Φ</sup> and <sup>Υ</sup>(α) = <sup>Q</sup>1x<sup>1</sup> ...Qmx<sup>m</sup> . <sup>Φ</sup>, both written in prenex form, with matrices <sup>Φ</sup> and <sup>Φ</sup>, respectively. Because we assumed that the first order theory of D has quantifier elimination, the satisfiability problem for Υ(α) is decidable. If Υ(α) is satisfiable, we build a counterexample for emptiness w such that w<sup>Σ</sup> = α and w<sup>D</sup> is a satisfying assignment for Υ(α). Otherwise, Υ(α) is unsatisfiable and there exist witness terms τ<sup>i</sup><sup>1</sup> ...τ<sup>i</sup>- , where {i1,...,i} <sup>=</sup> {<sup>j</sup> <sup>∈</sup> [1, m] <sup>|</sup> <sup>Q</sup><sup>j</sup> <sup>=</sup> ∀}, such that <sup>Φ</sup>[τ<sup>i</sup><sup>1</sup> /x<sup>i</sup><sup>1</sup> ,...,τ<sup>i</sup>- /x<sup>i</sup>- ] is unsatisfiable (Theorem 3). Then it turns out that the formula <sup>Φ</sup>[τ<sup>i</sup><sup>1</sup> /x<sup>i</sup><sup>1</sup> ,...,τ<sup>i</sup>- /x<sup>i</sup>- ], obtained analogously from the matrix of <sup>Υ</sup>(α), is unsatisfiable as well (Lemma 6 below). Because this latter formula is structured as a conjunction of formulae ι (0) <sup>∧</sup> <sup>φ</sup><sup>1</sup> ... <sup>∧</sup> <sup>φ</sup><sup>n</sup> <sup>∧</sup> <sup>ψ</sup>, where V(φk) <sup>∩</sup> <sup>Q</sup>(≤n) <sup>⊆</sup> <sup>Q</sup>(k−1) <sup>∪</sup> <sup>Q</sup>(k) and V(ψ) <sup>∩</sup> <sup>Q</sup>(≤n) <sup>⊆</sup> <sup>Q</sup>(n) , it is now possible to use an existing interpolation procedure for the quantifier-free theory of D, extended with uninterpreted function symbols, to compute a (not necessarily local) GLI (I0,...,In) such that V(Ik) <sup>∩</sup> <sup>Q</sup>(≤n) <sup>⊆</sup> <sup>Q</sup>(k) , for all <sup>k</sup> <sup>∈</sup> [n].

*Example 3* (*Contd. from Examples* 1 *and* 2*).* The formula Υ(α) (Example 2) is unsatisfiable and let τ<sup>2</sup> def = z<sup>1</sup> be the witness term for the universally quantified variable <sup>z</sup>2. Replacing <sup>z</sup><sup>2</sup> with <sup>τ</sup><sup>2</sup> (z1) in the matrix of <sup>Υ</sup>(α) (Example 1) yields the unsatisfiable conjunction below, obtained after trivial simplifications:

$$\begin{array}{c} \left[ z\_1 \ge 0 \land q^{(0)}(z\_1) \right] \land \left[ q^{(0)}(z\_1) \to x^{(1)} \ge 0 \land q^{(1)}(x^{(1)} + z\_1) \right] \land \\\left[ q^{(1)}(x^{(1)} + z\_1) \to x^{(1)} + z\_1 < 0 \land q^{(2)}(x^{(2)} + x^{(1)} + z\_1) \right] \end{array}$$

A non-local GLI for the above conjunction is the sequence of formulae:

$$\{q^{(0)}(z\_1) \land z\_1 \ge 0, \ x^{(1)} \ge 0 \land q^{(1)}(x^{(1)} + z\_1) \land z\_1 \ge 0, \ \bot\}$$

We formalize and prove the correctness for the above construction of nonlocal GLI. A function <sup>ξ</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup> is *monotonic* iff for each n<m we have <sup>ξ</sup>(n) <sup>≤</sup> <sup>ξ</sup>(m) and *finite-range* iff for each <sup>n</sup> <sup>∈</sup> <sup>N</sup> the set {<sup>m</sup> <sup>|</sup> <sup>ξ</sup>(m) = <sup>n</sup>} is finite. If <sup>ξ</sup> is finite-range, we denote by <sup>ξ</sup>−<sup>1</sup> max(n) <sup>∈</sup> <sup>N</sup> the maximal value <sup>m</sup> such that ξ(m) = n.

**Lemma 6.** *Given a non-empty input event sequence* <sup>α</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...a<sup>n</sup> <sup>∈</sup> <sup>Σ</sup>∗*, such that* <sup>Υ</sup>(α) *is unsatisfiable, let* <sup>Q</sup>1x<sup>1</sup> ...Qmx<sup>m</sup> . <sup>Φ</sup> *be a prenex form of* <sup>Υ</sup>(α) *and let* <sup>ξ</sup> : [1, m] <sup>→</sup> [n] *be a monotonic finite-range function mapping each transition quantifier to the minimal index from the sequence* <sup>Θ</sup>(α0),..., <sup>Θ</sup>(αn) *where it occurs. Then one can effectively build:*


Consequently, under two assumptions about the first-order theory of the data domain, namely (i) witness-producing quantifier elimination, and (ii) Lyndon interpolation for the quantifier-free fragment with uninterpreted functions, we developed a generic method that produces GLIs for unfeasible input event sequences. Moreover, each formula in the interpolant refers only to the current predicate symbols, the current and past input variables and the existentially quantified transition variables introduced at the previous steps. The remaining questions are how to use these GLIs to label the sequences in the unfolding of an automaton (Definition 2) and compute coverage (Definition 3) between nodes of the unfolding.

#### **4.1 Unfolding with Non-local Interpolants**

As required by Definition 2, the unfolding <sup>U</sup> of an automaton <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ is labeled by formulae <sup>U</sup>(α) <sup>∈</sup> Form<sup>+</sup>(Q, <sup>∅</sup>), with no free symbols, other than predicate symbols, such that the labeling is compatible with the transition relation of the automaton. Each newly expanded input sequence of A is initially labeled with and the labels are refined using GLIs computed from proofs of spuriousness. The following lemma describes the refinement of the labeling of an input sequence by a non-local GLI:

**Lemma 7.** *Let* <sup>U</sup> *be an unfolding of an automaton* <sup>A</sup> <sup>=</sup> Σ, X, Q, ι, F, Δ *such that* <sup>α</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...a<sup>n</sup> <sup>∈</sup> dom(U) *and* (I0,...,In) *is a GLI for* <sup>α</sup>*. Then the mapping* <sup>U</sup> : dom(U) <sup>→</sup> Form<sup>+</sup>(Q, <sup>∅</sup>) *is an unfolding of* <sup>A</sup>*, where:*


*Moreover,* α *is safe in* U *.*

Observe that, by Lemma 6(2), the set of free variables of a GLI formula I<sup>k</sup> consists of (i) variables X(≤k) keeping track of data values seen in the input at some earlier moment in time, and (ii) variables that track past choices made within the transition rules. Basically, it is not important when exactly in the past a certain input has been read or when a choice has been made, because only the relation between the values of these and the current variables determines the future behavior of the automaton. Quantifying these variables existentially does the job of ignoring when exactly in the past these values have been seen. Moreover, the last point of Lemma 7 ensures that the refined path is safe in the new unfolding and will stay safe in all future refinements of this unfolding.

The last ingredient of the lazy annotation semi-algorithm based on unfoldings consist in the implementation of the coverage check, when the unfolding of an automaton is labeled with conjunctions of existentially quantified formulae with predicate symbols, obtained from interpolation. By Definition 3, checking whether a given node <sup>α</sup> <sup>∈</sup> dom(U) is covered amounts to finding a prefix <sup>α</sup> <sup>α</sup> and a node <sup>β</sup> <sup>∈</sup> dom(U) such that <sup>U</sup>(α ) <sup>|</sup><sup>=</sup> <sup>U</sup>(β), or equivalently, the formula U(α ) ∧ ¬U(β) is unsatisfiable. However, the latter formula, in prenex form, has quantifier prefix in the language ∃<sup>∗</sup>∀<sup>∗</sup> and, as previously mentioned, the satisfiability problem for such formulae becomes undecidable when the data theory subsumes Presburger arithmetic [10].

Nevertheless, if we require just a yes/no answer (i.e. not an interpolant) recently developed quantifier instantiation heuristics [25] perform rather well in answering a large number of queries in this class. Observe, moreover, that coverage does not need to rely on a complete decision procedure. If the prover fails in answering the above satisfiability query, then the semi-algorithm assumes that the node is not covered and continues exploring its successors. Failure to compute complete coverage may lead to divergence (non-termination) and ultimately, to failure to prove emptiness, but does not affect the soundness of the semi-algorithm (real counterexamples will still be found).

#### **5 Experimental Results**

We have implemented a version of the IMPACT semi-algorithm [20] in a prototype tool, avaliable online [8]. The tool is written in Java and uses the Z3 SMT solver [27], via the JavaSMT interface [15], for spuriousness and coverage


 **1.** Experiments with First Order Alternating Automata

**Table**

queries and also for interpolant generation. Table 1 reports the size of the input automaton in bytes, the numbers of Predicates, Variables and Transitions, the result of emptiness check, the number of Expanded and Visited Nodes during the unfolding and the Time in miliseconds. The experiments were carried out on a MacOS x64 - 1.3 GHz Intel Core i5 - 8 GB 1867 MHz LPDDR3 machine.

The test cases shown in Table 1, come from several sources, namely predicate automata models (\*.pa) [6,7] available online [23], timed automata inclusion problems (abp.ada, train.ada, rr-crossing.foada), array logic entailments (array rotation.ada, array simple.ada, array shift.ada) and hardware circuit verification (hw1.ada, hw2.ada), initially considered in [13], with the restriction that local variables are made visible in the input. The train-simpleN. foada and fischer-mutexN. foada examples are parametric verification problems in which one checks inclusions of the form <sup>N</sup> <sup>i</sup>=1 <sup>L</sup>(Ai) <sup>⊆</sup> <sup>L</sup>(B), where <sup>A</sup><sup>i</sup> is the <sup>i</sup>-th copy of the template automaton.

The advantage of using FOADA over the INCLUDER [12] tool from [13] is the possibility of having automata over infinite alphabets with local variables, whose values are not visible in the input. In particular, this is essential for checking inclusion of timed automata that use internal clocks to control the computation.

#### **6 Conclusions**

We present first-order alternating automata, a model of computation that generalizes classical boolean alternating automata to first-order theories. Due to their expressivity, first-order alternating automata are closed under union, intersection and complement. However the emptiness problem is undecidable even in the most simple case, of the quantifier-free theory of equality with uninterpreted predicate symbols. We deal with the emptiness problem by developping a practical semi-algorithm that always terminates, when the automaton is not empty. In case of emptiness, termination of the semi-algorithm occurs in most practical test cases, as shown by a number of experiments.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Q3B: An Efficient BDD-based SMT Solver for Quantified Bit-Vectors**

Martin Jon´aˇs(B) and Jan Strejˇcek

Masaryk University, Brno, Czech Republic *{*xjonas,strejcek*}*@fi.muni.cz

**Abstract.** We present the first stable release of our tool Q3B for deciding satisfiability of quantified bit-vector formulas. Unlike other state-ofthe-art solvers for this problem, Q3B is based on translation of a formula to a bdd that represents models of the formula. The tool also employs advanced formula simplifications and approximations by effective bitwidth reduction and by abstraction of bit-vector operations. The paper focuses on the architecture and implementation aspects of the tool, and provides a brief experimental comparison with its competitors.

#### **1 Introduction**

Advances in solving formula *satisfiability modulo theories* (smt) achieved during the last few decades enabled significant progress and practical applications in the area of automated analysis, testing, and verification of various systems. In the case of software and hardware systems, the most relevant theory is the *theory of fixed-sized bit-vectors*, as these systems work with inputs expressed as bitvectors (i.e., sequences of bits) and perform bitwise and arithmetic operations on bit-vectors. The quantifier-free fragment of this theory is supported by many general-purpose smt solvers, such as CVC4 [1], MathSAT [7], Yices [10], or Z3 [9] and also by several dedicated solvers, such as Boolector [21] or STP [12]. However, there are some use-cases where quantifier-free formulas are not natural or expressive enough. For example, formulas containing quantifiers arise naturally when expressing loop invariants, ranking functions, loop summaries, or when checking equivalence of two symbolically described sets of states [8,13,17,18,24]. In the following, we focus on smt solvers for *quantified* bit-vector formulas. In particular, this paper describes the state-of-the-art smt solver Q3B including its implementation and the inner workings.

Solving of quantified bit-vector formulas was first supported by Z3 in 2013 [25] and for a limited set of *exists/forall* formulas with only a single quantifier alternation by Yices in 2015 [11]. Both of these solvers decide quantified formulas by *quantifier instantiation*, in which universally quantified variables in the Skolemized formula are repeatedly instantiated by ground terms until the resulting quantifier-free formula is unsatisfiable or a model of the original formula is found.

c The Author(s) 2019 I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 64–73, 2019. https://doi.org/10.1007/978-3-030-25543-5\_4

This work has been supported by Czech Science Foundation, grant GA18-02177S.

In 2016, we proposed a different approach for solving quantified bit-vector formulas: by using binary decision diagrams (bdds) and approximations [14]. For evaluation of this approach, we implemented an experimental smt solver called Q3B, which outperformed both Z3 and Yices. Next solver that was able to solve quantified bit-vector formulas was Boolector in 2017, using also an approach based on quantifier instantiation [22]. Unlike Z3, in which the universally quantified variables are instantiated only by constants or subterms of the original formula, Boolector uses a counterexample-guided synthesis approach, in which a suitable ground term for instantiation is synthesized based on the defined grammar. Thanks to this, Boolector was able to outperform Q3B and Z3 on certain classes of formulas. More recently, in 2018, support of quantified bit-vector formulas has also been implemented into CVC4 [20]. The approach of CVC4 is also based on quantifier instantiation, but instead of synthesizing terms given by the grammar as Boolector, CVC4 uses predetermined rules based on invertibility conditions, which directly give terms that can prune many spurious models without using potentially expensive counterexample-guided synthesis. The authors of CVC4 have shown that this approach outperforms Z3, CVC4, and the original Q3B. However, Q3B has been substantially improved since the original experimental version. In 2017, we extended it with simplifications of quantified bitvector formulas using unconstrained variables [15]. Further, in 2018, we added the experimental implementation of abstractions of bit-vector operations [16]. With these techniques, Q3B is able to decide more formulas than Z3, Boolector, and CVC4. Besides the theoretical improvements, Q3B was also improved in terms of stability, ease of use, technical parts of the implementation, and compliance with the smt-lib standard. This tool paper presents the result of these improvements: Q3B 1.0, the first stable version of Q3B.

We briefly summarize the smt solving approach of Q3B. As in most of modern smt solvers, the input formula is first simplified using satisfiability-preserving transformations that may reduce the size and complexity of the formula. The simplified formula is then converted to a binary decision diagram (bdd) that represents all assignments satisfying the formula, i.e., the *models* of the formula. If the bdd represents at least one model, we say that the bdd is *satisfiable* and it implies satisfiability of the formula. If the bdd represents the empty set of models, we say that it is *unsatisfiable* and so is the formula. Unfortunately, there are formulas for which the corresponding bdd (or some of the intermediate bdds that appear during its computation) is necessarily exponential in the number of bits in the formula. For example, this is the case for formulas that contain multiplication of two bit-vector variables [5]. To be able to deal with such formulas, Q3B computes in parallel also bdds underapproximating and overapproximating the original set of models, i.e., bdds representing subsets and supersets of the original set of models, respectively. The approximating bdds may be much smaller in size than the precise bdd, especially if the approximation is very rough. Still, they can be used to decide satisfiability of the original formula. If an overapproximating bdd is unsatisfiable, the original formula is also unsatisfiable. If the overapproximating bdd is satisfiable, we take one of its models, i.e., an assignment to the top-level existential variables of the formula, and check whether it is a model of the original formula. If the answer is positive, the original formula is satisfiable. In the other case, we build a more precise overapproximating bdd. Underapproximating bdds are utilized analogously. The only difference is that for unsatisfiable underapproximating bdd, we check the validity of a countermodel, i.e., an assignment to the top-level universal variables that makes the formula unsatisfiable. The approach is depicted in Fig. 1.

**Fig. 1.** High-level overview of the smt solving approach used by Q3B. The three shaded areas are executed in parallel and the first result is returned.

Q3B currently supports two ways of computing the approximating bdds from the input formula. First of these are *variable bit-width approximations* in which the *effective bit-width* of some variables is reduced. In other words, some of the variables are represented by fewer bits and the rest of the bits is set to zero bits, one bits, or the sign bit of the reduced variable. This approach was originally used by the smt solvers uclid [6] and Boolector [21]. Q3B extends this approach to quantified formulas: if bit-widths of only existentially quantified variables are reduced, the resulting bdd is underapproximating; if bit-widths of only universally quantified variables are reduced, the resulting bdd is overapproximating. The second way to obtain an approximation is *bit-vector operation abstraction* [16], during which the individual bit-vector operations may not compute all bits of the result, but produce some *do-not-know bits* if the resulting bdds would exceed a given number of nodes. An underapproximating bdd then represents assignments that satisfy the formula for all possible values of these do-not-know bits. Analogously, an overapproximating bdd represents all assignments that satisfy the formula for some value of the do-not-know bits. Q3B also supports a combination of these two methods, in which both the effective bit-with of variables is reduced and the limit on the size of bdd<sup>s</sup> is imposed. During an approximation refinement, either the effective bit-width or the size limit is increased, based on the detected cause of the imprecision.

**Fig. 2.** Architecture of Q3B. Components in the shaded box are parts of Q3B, the other components are external.

#### **2 Architecture**

This section describes the internal architecture of Q3B. The overall structure including internal and external components and the interactions between them is depicted in Fig. 2. We explain the purpose of the internal components:

**SMT-LIB Interpreter** (implemented in SMTLIBInterpreter.cpp) reads the input file in the smt-lib format [3], which is the standard input format for smt solvers. The interpreter executes all the commands from the file. In particular, it maintains the assertion stack and the options set by the user, calls solver when check-sat command is issued, and queries Solver if the user requires the model with the command get-model.


ing and underapproximating bdds. Precision of approximations depends on parameters set by the solver component.

**Cache** (implemented as a part of ExprToBDDTransformer.cpp) maintains for each converted subformula and subterm the corresponding bdd or a vector of bdds, respectively. Each of the three solvers has its own cache. When an approximating solver increases precision of the approximation, entries of its cache that can be affected by the precision change are invalidated. All the caches are internally implemented by hash-tables.

#### **3 Implementation**

Q3B is implemented in C++17, is open-source and available under MIT license on GitHub: https://github.com/martinjonas/Q3B. The project development process includes continuous integration and automatic regression tests.

Q3B relies on several external libraries and tools. For representation and manipulation with bdds, Q3B uses the open-source library cudd 3.0 [23]. Since cudd does not support bit-vector operations, we use the library by Peter Navr´atil [19] that implements bit-vector operations on top of cudd. The algorithms in this library are inspired by the ones in the bdd library BuDDy<sup>1</sup> and they provide a decent performance. Nevertheless, we have further improved its performance by several modifications. In particular, we added a specific code for handling expensive operations like bit-vector multiplication and division when arguments contain constant bdds. This for example considerably speeds up multiplication whenever one argument contains many constant zero bits, which is a frequent case when we use the variable bit-width approximation fixing some bits to zero. Further, we have fixed few incorrectly implemented bit-vector operations in the original library. Finally, we have extended the library with the support for do-not-know bits in inputs of the bit-vector operations and we have implemented abstract versions of arithmetic operations that can produce do-not-know bits when the result exceeds a given number of bdd nodes.

For parsing the input formulas in smt-lib format, Q3B uses antlr parser generated from the grammar<sup>2</sup> for smt-lib 2.6 [2]. We have modified the grammar to correctly handle bit-vector numerals and to support push and pop commands without numerical argument. The parser allows Q3B to support all bitvector operations and almost all smt-lib commands except get-assertions, get-assignment, get-proof, get-unsat-assumptions, get-unsat-core, and all the commands that work with algebraic data-types. This is in sharp contrast with the previous experimental versions of Q3B, which only collected all the assertions from the input file and performed the satisfiability check regardless of the rest of the commands and of the presence of the check-sat command. The reason for this was that the older versions parsed the input file using the Z3 C++ api, which can provide only the list of assertions, not the rest of the smt-lib script. Thanks to the new parser, Q3B 1.0 can also provide the user

<sup>1</sup> https://sourceforge.net/projects/buddy/.

<sup>2</sup> https://github.com/julianthome/smtlibv2-grammar.

with a model of a satisfiable formula after calling get-model; this important aspect of other smt solvers was completely missing in the previous versions.

On the other hand, C++ api of the solver Z3 is still used for internal representation of parsed formulas. The Z3 C++ api is also used to perform manipulations with formulas, such as substitution of values for variables, and some of the formula simplifications. Note that these are the only uses of Z3 api in Q3B during solving the formula; no actual smt- or sat-solving capabilities of Z3 are used during the solving process.

Some classes of Q3B, in particular Solver, FormulaSimplifier, and UnconstrainedVariableSimplifier, expose a public C++ api that can be used by external tools for smt solving or just performing formula simplifications. For example, Solver exposes method Solve(formula, approximationType), which can be used to decide satisfiability by the precise solver, the underapproximating solver, or the overapproximating solver. Solver also exposes the method SolveParallel(formula), which simplifies the input formula and runs all three of these solvers in parallel and returns the first result as depicted in Fig. 1.

#### **4 Experimental Evaluation**

We have evaluated the performance of QB3 1.0 and compared it to the latest versions of smt solvers Boolector (v3.0), CVC4 (v1.6), and Z3 (v4.8.4). All tools were used with their default settings except for CVC4, where we used the same settings as in the paper that introduces quantified bit-vector solving in CVC4 [20], since they give better results than the default CVC4 settings. As the benchmark set, we have used all 5751 quantified bit-vector formulas from the smt-lib repository. The benchmarks are divided into 8 distinct families of formulas. We have executed each solver on each benchmark with cpu time limit 20 min and ram limit of 8 GiB. All the experiments were performed in a Ubuntu 16.04 virtual machine within a computer equipped with Intel(R) Core(TM) i7-8700 CPU @ 3.20 GHz cpu and 32 GiB of ram. For reliable benchmarking we employed BenchExec [4], a tool that allocates specified resources for a program execution and precisely measures their usage. All scripts used for running benchmarks and processing their results, together with detailed descriptions and some additional results not presented in the paper, are available online<sup>3</sup>.

Table 1 shows the numbers of benchmarks in each benchmark family solved by the individual solvers. Q3B is able to solve the most benchmarks in benchmark families *2017-Preiner-scholl-smt08*, *2017-Preiner-tptp*, *2017-Preiner-UltimateAutomizer*, *2018-Preiner-cav18*, and *wintersteiger*, and it is competitive in the remaining families. In total, Q3B also solves more formulas than each of the other solvers: 116 more than Boolector, 83 more than CVC4, and 139 more than Z3. Although the numbers of solved formulas for the solvers seem fairly similar, the cross-comparison in Table 2 shows that the differences among the individual solvers are actually larger. For each other solver, there are at least

<sup>3</sup> https://github.com/martinjonas/q3b-artifact.

**Table 1.** For each solver and benchmark family, the table shows the number of benchmarks from the given family solved by the given solver. The column *Total* shows the total number of benchmarks in the given family. The last line provides the total cpu times for the benchmarks solved by all four solvers.


**Table 2.** For all pairs of the solvers, the table shows the number of benchmarks that were solved by the solver in the corresponding row, but not by the solver in the corresponding column. The column *Uniquely solved* shows the number of benchmarks that were solved only by the given solver.


143 benchmarks that can be solved by Q3B but not by the other solver. We think this shows the importance of developing an smt solver based on bdds and approximations besides the solvers based on quantifier instantiation.

#### **5 Conclusions and Future Work**

We have described the architecture and inner workings of the first stable version of the state-of-the-art smt solver Q3B. Experimental evaluation on all quantified bit-vector formulas from smt-lib repository shows that this solver slightly outperforms other state-of-the-art solvers for such formulas.

As future work, we would like to drop the dependency on the Z3 api: namely to implement our own representation of formulas and reimplement all the simplifications currently outsourced to Z3 api directly in Q3B. We also plan to extend some simplifications with an additional bookkeeping needed to construct a model of the original formula. With these extensions, all simplifications could be used even if the user wants to get a model of the formula. We would also like to implement production of unsatisfiable cores since they are also valuable for software verification.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **CVC4SY: Smart and Fast Term Enumeration for Syntax-Guided Synthesis**

Andrew Reynolds1, Haniel Barbosa1,

Andres Notzli ¨ 2(B) , Clark Barrett2, and Cesare Tinelli1

<sup>1</sup> The University of Iowa, Iowa City, USA <sup>2</sup> Stanford University, Stanford, USA noetzli@cs.stanford.edu

**Abstract.** We present CVC4SY, a syntax-guided synthesis (SyGuS) solver based on three bounded term enumeration strategies. The first encodes term enumeration as an extension of the quantifier-free theory of algebraic datatypes. The second is based on a highly optimized brute-force algorithm. The third combines elements of the others. Our implementation of the strategies within the satisfiability modulo theories (SMT) solver CVC4 and a heuristic to choose between them leads to significant improvements over state-of-the-art SyGuS solvers.

#### **1 Introduction**

Syntax-guided synthesis (SyGuS) [3] is a recent paradigm for program synthesis, successfully used for applications in formal verification and programming languages. Most SyGuS solvers perform counterexample-guided inductive synthesis (CEGIS) [16]: a refinement loop in which a learner proposes solutions, and a verifier, generally a satisfiability modulo theories (SMT) solver [8,9], checks them and provides counterexamples for failures. Generally, the learner enumerates some set of terms, while pruning spurious ones [17]. The simplicity and efficacy of enumerative SyGuS have made it the de facto approach for SyGuS, although alternatives exist for restricted fragments [4,14].

In previous work [14], we have shown how the SMT solver CVC4 [5] can itself act as an efficient synthesizer. This tool paper focuses on recent advances in the enumerative subsolver of CVC4, culminating in the current SyGuS solver CVC4SY. Figure 1 shows its main components. The term enumerator is parameterized by an enumeration strategy chosen before solving: CVC4SY S, whose constraint-based (smart) enumeration allows for numerous optimizations (Sect. 2); CVC4SY F, based on a new approach for (fast) enumerative synthesis (Sect. 3), which has significant advantages with respect to the enumerative solver CVC4SY S and other state-of-the-art approaches; and CVC4SY H, based on a hybrid approach combining smart and fast enumeration (Sect. 4). All strategies are fully integrated in CVC4, meaning they support inputs in many background theories, including arithmetic, bit-vectors, strings, and floating point. We evaluate these approaches on a large set of benchmarks (Sect. 5).

**Fig. 1.** Architecture of CVC4SY.

*The Problem.* A syntax-guided synthesis problem for a function f in a background theory T consists of a set of semantic restrictions, or specification, for f given by a (second-order) <sup>T</sup>-formula of the form <sup>D</sup>f.ϕ[f], and a set of syntactic restrictions on the solutions for f, typically expressed as a context-free grammar. An *enumerative* approach to this problem combines a *term enumerator* and a *solution verifier* for solving synthesis conjectures. The role of the term enumerator is to output a stream of terms t1, t2,... over some tuple x¯ of variables representing the inputs of f, where each ti[¯x] is a candidate solution. The role of the solution verifier is to check for each t<sup>i</sup> whether it is a solution for <sup>f</sup> by determining if the negated conjecture ϕ[λx.t ¯ <sup>i</sup>] is unsatisfiable.

*Bounded* term generation considers terms based on an ordering such as term size (the number of non-nullary symbols in a term). For each k = 0, 1, 2,..., the term enumerator outputs a *finite* set S<sup>k</sup> of terms, each of size at most k. Bounded term generation in CVC4SY is *complete* in the sense that, for any k, if f has a solution of size at most k, then at least one of the terms in S<sup>k</sup> is a solution for f. The effectiveness of an approach for (complete) bounded term generation can be evaluated based on two criteria: (i) the number of terms it generates and (ii) the rate at which it generates them.

We follow two approaches for enumerative SyGuS in CVC4SY, each optimized for one of the criteria above: a *smart* approach and a *fast* one. The first aims to generate reasonably quickly the smallest set of terms while maintaining completeness, while the second aims to generate terms as quickly as possible.

*Technical Preliminaries.* As we showed in previous work [14], syntactic restrictions can be conveniently represented as a set of *(algebraic) datatypes*, for which some SMT solvers have dedicated decision procedures [7,13]. For instance, given a function f : (<sup>x</sup> : Int) <sup>ˆ</sup> (<sup>y</sup> : Int) <sup>Ñ</sup> Int and the context-free grammar <sup>R</sup> below specifying what integer (I) and Boolean (B) terms can appear in candidate solutions for f:

$$I ::= 0 \mid 1 \mid x \mid y \mid I + I \mid I - I \mid \text{ite}(B, I, I) \tag{1}$$

$$B \implies B \gg B \quad | \ I \approx I \quad | \lnot B \mid B \land B \tag{2}$$

our SyGuS solver generates the following mutually recursive datatypes:

$$\mathcal{I} = \mathbf{0} \mid \mathbf{1} \mid \mathbf{x} \mid \mathbf{y} \mid \mathsf{plus}(\mathcal{I}, \mathcal{I}) \mid \mathsf{minus}(\mathcal{I}, \mathcal{I}) \mid \mathsf{ite}(\mathcal{B}, \mathcal{I}, \mathcal{I}) \tag{3}$$

$$\mathcal{B} = \mathsf{geq}(\mathcal{Z}, \mathcal{Z}) \; | \; \mathsf{eq}(\mathcal{Z}, \mathcal{Z}) \; | \; \mathsf{not}(\mathcal{B}) \; | \; \mathsf{and}(\mathcal{B}, \mathcal{B}) \tag{4}$$

Each datatype constructor corresponds to a production rule of R, e.g. plus corresponds to the rule I ::= I +I. A datatype term such as plus(x, y) represents the arithmetic term x + y. We will use these datatypes as a running example.

For a datatype term t, we write isC(t) to denote the *discriminator* predicate that is satisfied exactly when t is interpreted as a datatype whose top constructor is C. We write sel<sup>τ</sup> <sup>n</sup>(t) to denote a *shared selector* [15] applied to t, interpreted as the n*th* child of t with type τ if one exists, and interpreted as an arbitrary element of τ otherwise. A term consisting of zero or more consecutive nested applications of shared selectors applied to a term t is a *shared selector chain (for* t*)*.

#### **2 Smart Enumerative SyGuS**

Our *smart enumerative SyGuS* approach CVC4SY S, is based on finding solutions for an evolving set of constraints in an extension of the quantifier-free fragment of algebraic datatypes. These constraints are constructed to rule out many redundant solutions while not overconstraining the problem, potentially missing actual solutions.

In detail, candidate solutions for the function <sup>f</sup> : <sup>τ</sup><sup>1</sup> <sup>→</sup> <sup>τ</sup><sup>2</sup> to be synthesized are constructed by maintaining a set of constraints F, initially empty, for a first-order variable d ranging over the datatype representing τ2. For example, consider again the function f with the syntactic restrictions expressed by the datatypes in Eqs. 3 and 4. If the term generator finds a model for F, it provides to the solution verifier the integer term which corresponds to the value of d in the model; for example, it provides x + 1 when d is interpreted as plus(x, 1). In turn, if the solution verifier finds that x+ 1 is not a solution, it provides the *blocking constraint* isplus(d) \_ isx(sel<sup>I</sup> <sup>1</sup> (d)) \_ is1(sel<sup>I</sup> <sup>2</sup> (d)), i.e., the datatype constraint that rules out the current value for d, which is then added to F. This is a *syntactic* constraint on future candidate solutions from the term generator. Its atoms are discriminators applied to shared selector chains.

CVC4SY S uses a number of optimization techniques in addition to the basic loop above, which we describe in the remainder of this section. These techniques produce blocking constraints via the lemmas-on-demand paradigm [6] that eagerly rule out spurious candidates, *prior* to the solution verification step. Additionally, whenever possible, it *strengthens* blocking constraints via novel generalization techniques, with the effect of ruling out larger classes of candidates.

*Blocking via Theory Rewriting with Structural Generalization.* As we describe in previous work [14], the enumerative solver of CVC4 uses its rewriter as an oracle for discovering when candidate solutions are redundant. The motivation is that for any two equivalent terms t and s, only one of them needs to be checked with the solution verifier, since either both t and s are solutions to the synthesis conjecture or neither is. Given a term <sup>t</sup>, we write <sup>t</sup><sup>Ó</sup> to denote its *rewritten form*. Note that it is possible for equivalent terms not to have the same rewritten form. This is a consequence of the trade-offs in the implementation of CVC4's rewriter, which must balance efficiency and completeness.

As an example, suppose that the term enumerator previously generated x+y and that <sup>d</sup>'s current value is the datatype term representing <sup>y</sup> <sup>+</sup> <sup>x</sup>, where, however, (<sup>x</sup> <sup>+</sup> <sup>y</sup>)<sup>Ó</sup> <sup>=</sup> (<sup>y</sup> <sup>+</sup> <sup>x</sup>)Ó. We first generate a blocking constraint template <sup>R</sup>[z] of the form isplus(z)\_ isy(sel<sup>I</sup> <sup>1</sup> (z))\_isx(sel<sup>I</sup> <sup>2</sup> (z)), where z is a fresh variable. This template is subsequently instantiated with <sup>z</sup> ÞÑ <sup>u</sup> for any shared selector chain <sup>u</sup> of type <sup>I</sup> that currently (or later) appears in F, starting with d itself. This has the effect of ruling out all candidate solutions that have y + x as a subterm, which is justified by the fact that each such term is equivalent to one in which all occurrences of y + x are replaced by x + y.

We employ a refinement of this technique, which we call *theory rewriting with structural generalization*, which searches for and then blocks only the minimal skeleton of the term under test that is sufficient for determining its rewritten form. For example, consider the if-then-else term <sup>t</sup> <sup>=</sup> ite(<sup>x</sup> « <sup>0</sup> ^ <sup>y</sup> <sup>ě</sup> <sup>0</sup>, <sup>0</sup>, x), This term is equivalent to <sup>x</sup>, regardless of the value of predicate <sup>y</sup> <sup>ě</sup> <sup>0</sup>. This can be confirmed by the rewriter by computing that ite(<sup>x</sup> « <sup>0</sup> ^ w, <sup>0</sup>, x)<sup>Ó</sup> <sup>=</sup> <sup>x</sup> where <sup>w</sup> is a fresh Boolean variable. Then, instead of generating a constraint that blocks only (the datatype value corresponding to) <sup>t</sup>, we generate a stronger constraint that does not depend on the subterm <sup>y</sup> <sup>ě</sup> <sup>0</sup>. In other words, this blocking constraint rules out all candidate solutions that contain the subterm ite(<sup>x</sup> « <sup>0</sup>^w, <sup>0</sup>, x), for *any* term <sup>w</sup>. We compute these generalizations using a recursive algorithm that iteratively replaces *each* subterm of the current candidate with a fresh variable, and checks whether its rewritten form remains the same.

*Blocking via CEGIS with Structural Generalization.* Synthesis solvers based on CEGIS maintain a list of *refinement points* that witness the infeasibility of previous candidate solutions. That is, given a synthesis conjecture <sup>D</sup>f. <sup>∀</sup>x. ϕ ¯ [f, <sup>x</sup>¯], the solver maintains a growing list p¯1,..., p¯<sup>n</sup> of values for x¯ that witness the infeasibility of previous candidates u1,...,u<sup>n</sup> for f. Then, when a new candidate u is generated, we first check whether <sup>ϕ</sup>[u, <sup>p</sup>¯i] is false for some <sup>i</sup> <sup>ď</sup> <sup>n</sup>. When a candidate <sup>u</sup> fails to satisfy <sup>ϕ</sup>[u, <sup>p</sup>¯i], CVC4SY S further applies a form of generalization analogous to the structural generalization described above. We call this *CEGIS with structural generalization*, where the goal is to find the minimal skeleton of u that also fails to satisfy some refinement point.

For example, suppose f is the function to synthesize, ϕ includes the constraint <sup>f</sup>(x, y) <sup>ď</sup> <sup>x</sup> <sup>−</sup> <sup>1</sup>, and <sup>p</sup><sup>1</sup> = (3, 3) is a refinement point. Then, the candidate term <sup>u</sup>[x, y] = ite(<sup>x</sup> <sup>ě</sup> <sup>0</sup>, x, y + 1) will be discarded, because ite(3 <sup>ě</sup> <sup>0</sup>, <sup>3</sup>, 4) <sup>ę</sup> <sup>2</sup>. Notice, however, that *any* candidate <sup>u</sup> <sup>=</sup> ite(<sup>x</sup> <sup>ě</sup> <sup>0</sup>, x, w) is falsified by <sup>p</sup>1, regardless of what w is, since u [3, 3] <sup>ď</sup> <sup>2</sup> is equivalent to <sup>3</sup> <sup>ď</sup> <sup>2</sup>. This indicates that we can block *all* ite candidate terms with condition <sup>x</sup> <sup>ě</sup> <sup>0</sup> and true branch <sup>x</sup>. We can express this constraint in CVC4SY S by dropping the disjuncts that relate to the false branch of the ite term. This form of blocking is particularly useful when synthesizing multiple functions (f1, ...,fn), since it is often the case that a candidate for a single f<sup>i</sup> is already sufficient to falsify the specification, regardless of what the candidates for the other functions are.

*Evaluation Unfolding.* This technique uses *evaluation functions* to encode the relationship between the datatype terms assigned to d and their analogs in the theory T. For example, the evaluation function for the datatype I defined in (3) is a function <sup>E</sup><sup>I</sup> : <sup>I</sup> <sup>ˆ</sup>IntˆInt ÞÑ Int defined axiomatically so that <sup>E</sup>I(d, m, n) denotes the result of evaluating d by interpreting any occurrences of x and y in d respectively as m and n and interpreting the other constructors as the corresponding arithmetic/Boolean operators, e.g. <sup>E</sup>I(minus(x, <sup>y</sup>), <sup>5</sup>, 3) is interpreted as <sup>2</sup>. When a refinement point <sup>c</sup>¯is generated, we add a constraint requiring that the evaluation of d at c¯must satisfy the specification. For example, for conjecture <sup>D</sup>f. <sup>∀</sup>x. f(<sup>x</sup> + 1, x) <sup>ď</sup> <sup>0</sup>, and refinement point <sup>x</sup> ÞÑ <sup>1</sup>, we add the constraint <sup>E</sup>I(d, <sup>2</sup>, 1) <sup>ď</sup> <sup>0</sup>. Then, when a literal isC(t) is asserted for a term <sup>t</sup> of type I, we can add a constraint corresponding to the one-step unfolding of the evaluation of t. Specifically, when isite(d) is asserted, we generate the constraint

$$\mathsf{is}\_{\mathsf{ite}}(d) \Rightarrow \mathsf{E}\_{\mathcal{T}}(d,2,1) \approx \mathsf{ite}(\mathsf{E}\_{\mathcal{B}}(\mathsf{sel}\_{1}^{\mathcal{B}}(d),2,1), \mathsf{E}\_{\mathcal{T}}(\mathsf{sel}\_{1}^{\mathcal{T}}(d),2,1), \mathsf{E}\_{\mathcal{T}}(\mathsf{sel}\_{2}^{\mathcal{T}}(d),2,1))$$

indicating that the evaluation of d on point (2, 1) indeed behaves like an ite term when d has top symbol ite. Our implementation adds these constraints for all terms t whose top symbols correspond to ite or Boolean connectives. For terms t whose top symbol is any of the other operators, we add constraints corresponding to their total evaluation of t when the value of <sup>t</sup> is fully determined, for example, <sup>t</sup> « plus(x, <sup>y</sup>) <sup>ñ</sup> <sup>E</sup>I(t, <sup>2</sup>, 1) « <sup>3</sup>. Notice this constraint with <sup>t</sup> <sup>=</sup> <sup>d</sup> along with the refinement constraint <sup>E</sup>I(d, <sup>2</sup>, 1) <sup>ď</sup> <sup>0</sup> suffices to show that d cannot be plus(x, y).

#### **3 Fast Enumerative SyGuS**

The techniques in the previous section prune the search space so that often, only a small subset of the entire possible set of terms is considered for a given term size bound. The main bottleneck, however, is managing the large number of blocking constraints generated. Moreover, the benefits of this approach are limited when the grammar or specification does not admit opportunities for generalization.

For this reason, we have also developed CVC4SY F, which, in the spirit of other SyGuS solvers (notably ESOLVER [17]), relies on a principled brute-force approach for term generation. In contrast to other solvers, however, which are built as layers on top of the core SMT reasoner, CVC4SY F is fully integrated as a subsolver of CVC4, so communication with other components has almost no overhead. This technique, *fast enumerative synthesis*, does not use constraint solving to generate new terms. As a result, the majority of optimizations from Sect. 2 are incompatible with it.

*Algorithm.* To generate terms up to a given size k, we maintain a set S<sup>k</sup> <sup>τ</sup> of terms of type τ and size k for each datatype τ corresponding to a non-terminal symbol of our input grammar <sup>R</sup>. First, we compute for each such <sup>τ</sup> the set <sup>C</sup><sup>τ</sup> of its *constructor classes*, an equivalence relation over the constructors of τ that groups them by their type. For example, the constructor classes for <sup>I</sup> are {x, <sup>y</sup>, <sup>0</sup>, <sup>1</sup>}, {plus, minus} and {ite}. Then, we use the following procedure for generating all terms of size k for type τ :

FASTENUM(τ , k):

For all:


The recursive procedure FASTENUM(τ , k) populates the set S<sup>k</sup> <sup>τ</sup> of all terms of type τ with size k. These sets are cached globally. We incorporate an optimization that only adds terms C(t1,...,tn) to S<sup>k</sup> <sup>τ</sup> whose corresponding terms in the theory T are unique up to rewriting. This mimics the effect of blocking via theory rewriting as described in Sect. 2. For example, plus(y, x) is not added to S<sup>1</sup> <sup>I</sup> if that set already contains plus(x, <sup>y</sup>), noting that (<sup>x</sup> <sup>+</sup> <sup>y</sup>)<sup>Ó</sup> = (<sup>y</sup> <sup>+</sup> <sup>x</sup>)Ó. By construction of <sup>S</sup><sup>k</sup> <sup>τ</sup> for <sup>k</sup> <sup>ě</sup> <sup>1</sup>, this has the cascading effect of excluding all terms having y + x as a subterm.

We observe that theory rewriting with structural generalization cannot be easily incorporated into this scheme since it requires the use of a constraint solver, something that the above algorithm seeks to avoid.

#### **4 Hybrid Approach: Variable-Agnostic Enumerative SyGuS**

We follow a third approach, in solver CVC4SY H, that combines elements of the previous approaches. The idea is to use the (smart) approach from Sect. 2 to generate terms, but then generate *multiple* candidate solutions from each term using a fast subprocedure we call a *concretizer*. We implement an instance of this scheme, which we call *variable-agnostic* term generation, that produces only terms that are unique modulo alpha-equivalence. In our running example, when a term t such as x + 1 is produced, the concretizer produces all terms generated by the grammar R that are alphaequivalent to <sup>t</sup>, namely, {<sup>x</sup> + 1, y + 1} in this case. The advantage of this approach is that CVC4SY H can block any term whose variables are not canonically ordered; that is, assuming for instance that <sup>x</sup> <sup>ă</sup> <sup>y</sup>, it may block terms like <sup>1</sup> <sup>−</sup> <sup>y</sup> and <sup>y</sup> <sup>+</sup> <sup>y</sup>, noting they are alpha-equivalent to <sup>1</sup> <sup>−</sup> <sup>x</sup> and <sup>x</sup> <sup>+</sup> <sup>x</sup>, respectively. To implement this blocking scheme, we introduce unary Boolean predicates pre<sup>x</sup> and post<sup>x</sup> for each variable x in our grammar, where pre<sup>x</sup> (resp., postx) holds for t if and only if variable x occurs in a depth-first left-to-right traversal of our candidate term before (resp., after) traversing to the position indicated by the selector chain t. We encode the semantics of these predicates based on the arguments of constructors in our signature, e.g. isplus(z) <sup>ñ</sup> (prex(z) « prex(sel<sup>I</sup> <sup>1</sup> (z)) ^ postx(sel<sup>I</sup> <sup>2</sup> (z)) « postx(z)). We then assert that pre<sup>x</sup> and pre<sup>y</sup> are false for our top-level variable <sup>d</sup>, and require isy(z) <sup>ñ</sup> prex(z) for all z, stating that x must come before y in the traversal of any generated term.

This technique is useful for grammars with many variables, such as grammars in invariant synthesis problems, where the number of terms of small size is prohibitively large. Blocking based on theory rewriting (with generalization) from Sect. 2 is compatible with this technique and is used in CVC4SY H. However, the other optimizations are disabled, since they prune solutions in a way that is not agnostic to variables.

#### **5 Evaluation**

We evaluated the above techniques in CVC4SY on four benchmark sets: invariant synthesis benchmarks from the verification of Lustre [11] models; a set from work on synthesizing invertibility conditions for bit-vector operators [12] (IC-BV); a set of bit-vector invariant synthesis problems [2] (CegisT); and the SyGuS-COMP 2018 [1] benchmarks from five tracks: assorted problems (General), conditional linear arithmetic


**Table 1.** Summary of number of problems solved per benchmark set. Best results are in **bold**.

problems (CLIA), invariant synthesis problems (INV), and programming-by-examples problems [10] with a set over bit-vectors (PBE-BV) and another over strings (PBE-Str). We also considered separately the CrCi subset from General, which corresponds to cryptographic circuit synthesis. We ran our experiments on a cluster equipped with Intel E5-2637 v4 CPUs running Ubuntu 16.04, providing one core, 1800 s, and 8 GB RAM for each job. Results are summarized in Table 1 and Fig. 2. We denote the strategies from Sects. 2, 3, and 4 by **s**, **f** and **h**, respectively (smart, fast, and hybrid); disabling the optimizations from Sect. 2 is marked by "-" and the suffixes **r** (rewriting), **rg** (rewriting with structural generalization), **cg** (CEGIS with structural generalization), and **eu** (evaluation unfolding). We also evaluated two meta-strategies of CVC4SY: **a** and **a+si**. The auto strategy **a** picks a strategy based on the properties of the problem: **f** for PBE problems and for problems without the Boolean type or the ite operator in their grammar and **s** otherwise. Strategy **a+si** uses the single-invocation solver [14] on problems that are amenable to quantifier elimination and **a** otherwise. We use the state-of-the-art SyGuS solver EUSOLVER [4] (**EUS**) as a baseline, but only for SyGuS-COMP benchmarks due to limitations in its parser.

Overall, strategy **s** excels on more challenging benchmark sets such as Lustre and Gen-Crci, while strategy **f** excels on the majority of the others. The gains for **f** are especially significant on PBE problems, where it outperforms both **s** and **EUS** by several orders of magnitude. Such gains are significant given that CVC4 won this track at SyGuS-COMP 2018 by employing **s** alone, and a variant of **EUS** won it in 2017. This result can be explained as a consequence of two factors. First, the string and bitvector grammars contain many operators with the same type, making the constructor class optimization of the **f** algorithm very effective. Second, although not described in this paper, all solvers in our evaluation use divide-and-conquer algorithms for PBE problems [4], which are not compatible with the optimizations **cg** and **eu**. The most important optimization for all CVC4SY strategies and with all benchmark sets is **r**. The optimization **eu** is especially effective when grammars contain ite and Boolean connectives, such as those in the Lustre set and in some subsets of General, on which we can

**Fig. 2.** Cactus plot on commonly supported benchmark sets. The first scatter plot is for the Lustre set, the second for the Gen-Crci set, and the latter two for the 862 benchmarks from the PBE sets.

see the biggest gains of **s** with respect to **s-eu**; **cg** is more helpful for IC-BV, with a few harder benchmarks only solved due to this technique.

The first scatter plot in Fig. 2 shows the advantage of **h** over **s** on Lustre, a benchmark set containing invariant synthesis problems with dozens of variables. We remark this configuration excels at quickly finding small solutions for problems with many variables, although solves fewer problems overall. The second scatter plot shows that while **s** takes significantly longer on easy problems, it outperforms **f** in the long run. The last two plots show that **f** significantly outperforms the state of the art on PBE benchmarks.

For all benchmark sets, the auto strategy **a** chooses the best enumerative strategy of CVC4SY with only a few exceptions, and hence it is the default configuration of CVC4SY. Due to specialized synthesis techniques [4,14], both **a+si** and **EUS** outperform the purely enumerative strategies of CVC4. This is reflected in the cactus plot on the commonly supported benchmark sets, where **a** and **f** solve more benchmarks than **EUS** for lower times but then **EUS** solves more benchmarks in the end. For **a+si**, the cactus plot shows that it outperforms **EUS** significantly. Nevertheless, we remark that **a+si** is able to solve only 393 (16%) of the overall benchmarks using only single invocation techniques. Hence, we conclude that both smart and fast enumerative strategies are critical subcomponents in our approach to syntax-guided synthesis.

**Acknowledgments.** This work was partially supported by the National Science Foundation under award 1656926 and by the Defense Advanced Research Projects Agency under award FA8650-18-2-7854.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Incremental Determinization for Quantifier Elimination and Functional Synthesis**

Markus N. Rabe(B)

Google, Mountain View, CA, USA mrabe@google.com

**Abstract.** Quantifier elimination and its cousin functional synthesis are fundamental problems in automated reasoning that could be used in many applications of formal methods. But, effective algorithms are still elusive. In this paper, we suggest a simple modification to a QBF algorithm to adapt it for quantifier elimination and functional synthesis. We demonstrate that the approach significantly outperforms previous algorithms for functional synthesis.

#### **1 Introduction**

Given a Boolean formula <sup>∃</sup>Y.ϕ with free variables <sup>X</sup>, *quantifier elimination* (also called *projection*) is the problem to find a formula <sup>ψ</sup> ≡ ∃Y.ϕ that only contains variables X. Closely related, the *functional synthesis* problem is to find a function <sup>f</sup><sup>y</sup> : 2<sup>X</sup> <sup>→</sup> <sup>B</sup> for all <sup>y</sup> <sup>∈</sup> <sup>Y</sup> , such that <sup>ϕ</sup>[<sup>Y</sup> → <sup>f</sup>y(X)] ≡ ∃Y.ϕ.

Quantifier elimination and functional synthesis are fundamental operations in automated reasoning, computer-aided design, and verification. Hence, progress in algorithms for these problems benefits a broad range of applications of formal methods. For example, typical algorithms for reactive synthesis reduce to computing the safe region of a safety game through repeated quantifier eliminations [1–3] or directly employ functional synthesis [4]. Until today, algorithms for quantifier elimination often involve (reduced ordered) Binary Decision Diagrams (BDDs) [5]. However, BDDs often grow exponentially for applications in verification, and extracting formulas (or strategies, etc.) from BDDs typically results in huge expressions. The search for alternatives resulted in CEGAR-style algorithms [6–10].

In this work, we take look at the closely related field of QBF solving. There pure CEGAR solving [11–13] on the CNF representation is not competitive anymore [14], and it has been augmented by preprocessing [15,16], circuit representations [17–21], and Incremental Determinization (ID) [22]. It may hence be fruitful to leverage some of the recent developments of QBF.

The contribution of this work is a simple modification of ID to enable quantifier elimination and functional synthesis. Incremental Determinization (ID) is an algorithm for solving quantified Boolean formulas of the shape <sup>∀</sup>X. <sup>∃</sup>Y.ϕ, where

c The Author(s) 2019

M.N. Rabe–Work partially done at University of California at Berkeley.

I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 84–94, 2019. https://doi.org/10.1007/978-3-030-25543-5\_6

ϕ is a propositional formula in conjunctive normal form (CNF), i.e. 2QBF. It follows a proof-theoretic approach, very similar to a SAT solver, alternating between building a model (i.e. Skolem functions for the existential variables Y ) and a refutation proof [23]. This allows ID to provide a model (i.e. a Skolem function) when it determines that a formula is true, which sets it apart from other QBF algorithms.

The modification of ID to enable quantifier elimination for a given formula <sup>∃</sup>Y.ϕ is very simple: We run ID on the formula as if it was a quantified Boolean formula <sup>∀</sup>X. <sup>∃</sup>Y.ϕ, where <sup>X</sup> are the free variables, but add <sup>ϕ</sup> to the conflict check within ID. This suppresses the UNSAT result in the ID algorithm and it is hence forced to terminate with a model (that is, a function), which is guaranteed to satisfy the functional synthesis requirements. Quantifier elimination is then only a substitution away.

Our experimental evaluation shows that ID significantly outperforms previous algorithms for functional synthesis and quantifier elimination.

This paper is structured as follows: We review related work in Sect. 2 and introduce standard notation in Sect. 3. In Sect. 4 we first review the Incremental Determinization algorithm before introducing the change necessary to lift it to functional synthesis. The experimental evaluation is in Sect. 5. We summarize the current state of the tool CADET in Sect. 6 and conclude the paper in Sect. 7.

#### **2 Related Work**

*Functional Synthesis.* Early works on functional synthesis tried to exploit Craig interpolation, but did not scale well enough [24]. This was followed by first attempts to use CEGAR [6], which failed, however, to surpass the performance of BDDs [7]. More recent works revisited the use of BDDs, e.g. the tools SSyft [25] and RSynth [26,27]. This motivated the search for alternatives to BDDs [8–10]. At their core, these new algorithms all rely on counter-example guided abstraction refinement (CEGAR) [28], but they apply it in clever, compositional ways. However, they still inherit the well-known weaknesses of CEGAR (as, for example, discussed in the QBF literature): For the simple formula ϕ = - i<n <sup>x</sup><sup>i</sup> <sup>↔</sup> <sup>y</sup>i, where <sup>n</sup> <sup>=</sup> <sup>|</sup>X<sup>|</sup> <sup>=</sup> <sup>|</sup><sup>Y</sup> <sup>|</sup> and <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>X</sup> and <sup>y</sup><sup>i</sup> <sup>∈</sup> <sup>Y</sup> , CEGAR needs to browse through 2<sup>n</sup> satisfying assignments just to recover that the function we were looking for is f(x) = x.

The Back-and-Forth algorithm explores stronger abstraction using MaxSAT solvers as a means to reduce the number of assignments that CEGAR needs to explore [8]. ParSyn attempts to combat the problem with parallel compute power and a compositional approach [9]. This compositional approach has later been refined using a wDNNF decomposition [10].

*QBF Certification.* Some solvers and preprocessors for QBF have the ability to not only provide a yes/no answer, but also produce a certificate (i.e. Skolem functions) for their result [13,22,29,30]. While most QBF approaches suffer heavy performance penalties when asked to provide a certificate, Incremental Determinization naturally computes Skolem functions that can be extracted easily from the final state [22].

#### **3 Preliminaries**

Boolean formulas over a finite set of variables <sup>x</sup> <sup>∈</sup> <sup>X</sup> with domain <sup>B</sup> <sup>=</sup> {**0**, **<sup>1</sup>**} are generated by the following grammar:

$$\varphi \coloneqq \mathbf{0} \mid \mathbf{1} \mid x \mid \neg \varphi \mid (\varphi) \mid \varphi \vee \varphi \mid \varphi \wedge \varphi$$

Other logical operations, such as implication, XOR, and equality, are considered syntactic sugar with the usual definitions.

An *assignment <sup>x</sup>* to a set of variables <sup>X</sup> is a function *<sup>x</sup>* : <sup>X</sup> <sup>→</sup> <sup>B</sup> that maps each variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> to either **<sup>1</sup>** or **<sup>0</sup>**. We denote the space of assignments to some set of variables X with 2<sup>X</sup>.

Given formulas ϕ and ϕ , and a variable x, we denote the substitution of x by <sup>ϕ</sup> in <sup>ϕ</sup> as <sup>ϕ</sup>[<sup>x</sup> <sup>→</sup> <sup>ϕ</sup> ]. We lift substitutions to sets of variables <sup>ϕ</sup>[<sup>X</sup> → <sup>t</sup>x] when <sup>t</sup><sup>x</sup> maps each <sup>x</sup> <sup>∈</sup> <sup>X</sup> to a formula <sup>ϕ</sup> .

<sup>A</sup> *literal* <sup>l</sup> is either a variable <sup>x</sup> <sup>∈</sup> <sup>X</sup>, or its negation <sup>¬</sup>x. We use <sup>l</sup> to denote the literal that is the logical negation of <sup>l</sup>. A disjunction of literals (l<sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>l</sup>n) is called a *clause* and their conjunction (l<sup>1</sup> <sup>∧</sup>...∧ln) is called a *cube*. We denote the variable of a literal by *var* (l) and lift the notion to clauses *var* (l1∨···∨ln) = {*var* (l1),..., *var* (ln)}.

A formula is in *conjunctive normal form* (CNF), if it is a conjunction of clauses. Throughout this exposition, we assume that the input formula is given in CNF. (The output, however, can be a non-CNF formula.) It is trivial to lift the approach to general Boolean formulas: Given a Boolean formula ϕ over variables <sup>X</sup>, the Tseitin transformation provides us a formula <sup>ψ</sup> with <sup>ϕ</sup> ≡ ∃Z.ψ, where <sup>Z</sup> are fresh variables [31]. Note that eliminating a group of variables <sup>X</sup> <sup>⊆</sup> <sup>X</sup> in <sup>ϕ</sup> is then the same as eliminating <sup>X</sup> <sup>∪</sup> <sup>Z</sup> in <sup>ψ</sup>.

*Resolution* is a well-known proof rule that allows us to merge two clauses as follows. Given two clauses <sup>C</sup><sup>1</sup> <sup>∨</sup> <sup>v</sup> and <sup>C</sup><sup>2</sup> ∨ ¬v, we call <sup>C</sup><sup>1</sup> <sup>⊗</sup><sup>v</sup> <sup>C</sup><sup>2</sup> <sup>=</sup> <sup>C</sup><sup>1</sup> <sup>∨</sup> <sup>C</sup><sup>2</sup> their *resolvent* with pivot <sup>v</sup>. The resolution rule states that <sup>C</sup><sup>1</sup> <sup>∨</sup> <sup>v</sup> and <sup>C</sup><sup>2</sup> ∨ ¬<sup>v</sup> imply their resolvent. Resolution is *refutationally complete* for Boolean formulas in CNF, i.e. given a formula in CNF that is equivalent to false, we can derive the empty clause using only resolution.

#### **4 Lifting Incremental Determinization**

In the sequel, we formally define functional synthesis, review the working principle of Incremental Determinization for 2QBF, discuss how the solver state corresponds to functions, and then introduce the modification to Incremental Determinization to turn it into an algorithm for functional synthesis. The *functional synthesis* problem is to find a function <sup>f</sup><sup>y</sup> : 2<sup>X</sup> <sup>→</sup> <sup>B</sup> for all <sup>y</sup> <sup>∈</sup> <sup>Y</sup> , such that <sup>ϕ</sup>[<sup>Y</sup> → <sup>f</sup>y(X)] ≡ ∃Y.ϕ. Functional synthesis is closely related to solving 2QBF: Given a true 2QBF problem <sup>∀</sup>X. <sup>∃</sup>Y.ϕ, any Skolem function that is a model for the formula is also a solution to the functional synthesis problem for variable sets X and Y . Only for false 2QBF there is a difference between the problems: if there is an assignment *x* to X for which there is no assignment to Y , the 2QBF cannot be proven with a Skolem function, but the functional synthesis problem still requires us to produce a function f. It is clear that for input *x* the f can produce any output. We will exploit this similarity between 2QBF and functional synthesis in the following to lift the Incremental Determinization algorithm to functional synthesis.

#### **4.1 Working Principle of Incremental Determinization for 2QBF**

ID was originally introduced as an algorithm for 2QBF, the fragment of quantified Boolean formulas with at most one quantifier alternation. Given a formula <sup>∀</sup>X. <sup>∃</sup>Y.ϕ, ID alternates between constructing a model (i.e. a Skolem function) to prove the formula correct, and constructing a Q-resolution proof to refute the formula [32]. During model construction, ID identifies which variables in Y have unique Skolem functions considering the current set of clauses. When all variables with unique Skolem functions are identified, ID greedily introduces additional clauses to reduce the space of possible Skolem functions, such that the remaining variables may get unique Skolem functions, too. Whenever the model construction ends up in a dead-end (=conflict), ID switches to constructing a refutation proof [32] and derives clauses using resolution. As soon as ID found a clause that prevents the model construction from trying the same partial model again, it switches back to the model search. Since there are only finitely many clauses and models, either the model construction or the refutation proof must eventually finish [22,23].

*Example 1.* We will use the following formula as a running example:

$$\begin{array}{c} \left( \forall x\_1, x\_2. \exists y\_1, y\_2. \, y\_3. \, (x\_1 \lor \neg y\_1) \land (\neg x\_1 \lor y\_1) \land \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left( y\_1 \lor \neg y\_2 \right) \land \left( \neg y\_1 \lor \neg x\_2 \lor y\_2 \right) \land \\ \qquad \qquad \qquad \qquad \qquad \left( \neg y\_1 \lor y\_3 \right) \land \left( y\_2 \lor \neg y\_3 \right) \land \left( x\_2 \lor \neg y\_3 \right) \end{array} \right)$$

Looking at the first two clauses it is clear that y<sup>1</sup> is uniquely determined by x<sup>1</sup> and y1's Skolem function must be f<sup>y</sup><sup>1</sup> (X) = x1. For this step, we intentionally ignore all clauses of y<sup>1</sup> that contain y<sup>2</sup> and y3, as they do not yet have a Skolem function and we have to consider them as undefined. The other clauses containing y<sup>1</sup> will only become relevant when looking for Skolem functions for y<sup>2</sup> and y3.

Variables y<sup>2</sup> and y<sup>3</sup> do not have *unique* Skolem functions in the formula above. ID would now greedily add a *decision clause*, such as (x<sup>2</sup> ∨ ¬y2), to also make the Skolem function for y<sup>2</sup> unique. The added clause, plus clauses 3 and 4 in the formula define: <sup>f</sup><sup>y</sup><sup>2</sup> (X) = <sup>f</sup><sup>y</sup><sup>1</sup> (X) <sup>∧</sup> <sup>x</sup>2.

This results in the situation that there is no Skolem function for y3: For the assignment <sup>x</sup><sup>1</sup> → **<sup>1</sup>**, x<sup>2</sup> → **<sup>0</sup>**, the functions for <sup>y</sup><sup>1</sup> and <sup>y</sup><sup>2</sup> assign <sup>y</sup><sup>1</sup> → **<sup>1</sup>**, y<sup>2</sup> → **<sup>0</sup>**. Then clauses 4 and 5 cannot be satisfied both by y3, which means there is a conflict for this assignment to the universals. During conflict analysis, ID would now resolve clauses 5 and 6 to obtain clause (¬y<sup>1</sup> <sup>∨</sup> <sup>y</sup>2), and then backtrack to the point before introducing the decision clause. -

#### **4.2 Representation of Functions**

What is particularly interesting about ID is its ability to produce Skolem functions when it has proven a formula correct. Other than previous QBF algorithms, these Skolem functions are produced without any overhead.

ID avoids costly representations of Skolem functions: It maintains a set <sup>D</sup> <sup>⊆</sup> Y of variables that have a unique Skolem function, and its state includes a formula δ characterizing the input-output behavior of the Skolem functions for variables <sup>D</sup>. Formula <sup>δ</sup> satisfies <sup>∀</sup>X. <sup>∃</sup>!D. δ, where <sup>∃</sup>!<sup>D</sup> means that there exists exactly one assignment to D. We can thus think of δ also as a function f<sup>δ</sup> mapping X assignments to D assignments.

*Example 2.* Back to our running example. After identifying a unique Skolem function for y1, formula δ consists exactly of the first two clauses of the formula, (x<sup>1</sup> ∨¬y1)∧(¬x<sup>1</sup> <sup>∨</sup>y1). After adding the decision clause and identifying a unique Skolem function for y2, δ consists exactly of the first four clauses and the decision clause. -

#### **4.3 Conflict Checks in ID**

The formulas representing functions have primarily one purpose: to check for the existence of *conflicts*. Whenever we attempt to grow the set D by a variable v, we need to check whether v has a unique Skolem function. This check consists of two parts; given an arbitrary universal assignment *<sup>x</sup>* <sup>∈</sup> <sup>2</sup><sup>X</sup>,


To formally define this, let us consider the clauses (d<sup>1</sup> ∨···∨d<sup>n</sup> <sup>∨</sup>l) in <sup>ϕ</sup> that contain a literal l of variable v and otherwise only contain literals d<sup>i</sup> of variables in D and X. We call these the clauses with *unique consequence*, as they can be read as implications (¬d<sup>1</sup> ∧ ··· ∧ ¬d<sup>n</sup> <sup>⇒</sup> <sup>l</sup>), and we call <sup>¬</sup>d<sup>1</sup> ∧ ··· ∧ ¬d<sup>n</sup> the antecedent of that clause. Further, we define A<sup>l</sup> as the disjunction over all antecedents of literal <sup>l</sup>. (Note that <sup>A</sup><sup>l</sup> depends on <sup>D</sup> and therefore changes as the state of the solver progresses.)

The two checks from above can now be defined as follows:


Checking for case (1) can be efficiently approximated [22], but checking for case (2) cannot easily be avoided. We thus query a SAT solver with <sup>δ</sup>∧A<sup>v</sup> ∧A¬<sup>v</sup> to perform a conflict check.

*Example 3.* We revisit the conflict described in Example 1. The starting point is the situation when <sup>D</sup> <sup>=</sup> {y1, y2} and <sup>δ</sup> consists of the first four clauses of the formula as well as the decision clause (x<sup>2</sup> ∨ ¬y2). The antecedents of <sup>y</sup><sup>3</sup> are <sup>A</sup>y<sup>3</sup> <sup>=</sup> <sup>y</sup><sup>1</sup> and <sup>A</sup>¬y<sup>3</sup> <sup>=</sup> <sup>¬</sup>y2∨¬x2. It is easy to verify that the universal assignment <sup>x</sup><sup>1</sup> → **<sup>1</sup>**, x<sup>2</sup> → **<sup>0</sup>**, y<sup>1</sup> → **<sup>1</sup>**, y<sup>2</sup> → **<sup>0</sup>** satisfies the conflict criterion <sup>δ</sup>∧A<sup>v</sup> ∧A¬v. -

#### **4.4 Functional Synthesis**

Remember that in the case of functional synthesis for ϕ over sets of variables X and <sup>Y</sup> , we search for a function <sup>f</sup> : 2<sup>X</sup> <sup>→</sup> <sup>2</sup><sup>Y</sup> such that <sup>f</sup> produces a satisfying assignment whenever it can, but can produce anything when there is no assignment to Y satisfying the formula. In case there are satisfying assignments to Y for all <sup>X</sup>, we can simply run ID as if it was a QBF <sup>∀</sup>X. <sup>∃</sup>. ϕ to obtain a Skolem function that also satisfies the functional synthesis criterion. In the other case, that there is an X for which there is no assignment to Y satisfying ϕ, ID for 2QBF would eventually detect a conflict that did not depend on a decision and return with UNSAT.

In order to lift ID to functional synthesis, we want to ignore universal assignments that have no satisfying assignment to Y . A simple way to suppress these conflicts is to add ϕ to the conflict check. In order for an assignment to X to remain a conflict, we must now additionally find an assignment to Y that demonstrates that the conflict could be prevented by a different decision.

All other parts of ID, including the extraction of functions, remain untouched. In particular, termination is still guaranteed, as the greedy model construction either results in a function for all variables in Y or in a conflict, upon which at least one model is excluded through resolution.

*Example 4.* For the conflict in our running example, the universal assignment <sup>x</sup><sup>1</sup> → **<sup>1</sup>**, x<sup>2</sup> → **<sup>0</sup>** is excluded in the modified conflict check. Consider the UNSAT core consisting of clauses 2, 5, and 7 for that universal assignment: propagate <sup>y</sup><sup>1</sup> → **<sup>1</sup>** using clause 2; propagate <sup>y</sup><sup>3</sup> → **<sup>1</sup>** using clause 5; and finally propagate <sup>y</sup><sup>3</sup> → **<sup>0</sup>** using clause 7. So, instead of going into conflict analysis and backtracking, ID for functional synthesis concludes that it has found a function for all existential variables and terminates.

#### **4.5 Quantifier Elimination**

Given a formula <sup>∃</sup>Y.ϕ with free variables <sup>X</sup>, *quantifier elimination* is the problem to find a formula <sup>ψ</sup> ≡ ∃Y.ϕ over variables <sup>X</sup> only. Hence, given a solution <sup>f</sup> to the functional synthesis problem for ϕ, we only have to substitute Y by f in ϕ to obtain the projected formula.

#### **5 Experimental Evaluation**

We implemented the modifications to ID in CADET,<sup>1</sup> a competitive 2QBF solver [22]. In this section, we compare CADET experimentally with existing

<sup>1</sup> CADET is available at https://github.com/MarkusRabe/cadet.

**Fig. 1.** Log-scale cactus plot comparing the performance over all instances.

algorithms for functional synthesis. Additionally, we implemented a certificate checker for functional synthesis and for quantifier elimination, to make sure that the computed functions are correct. The certificate checker only shares the code for AIGER circuits and the SAT solver (of which we have tried several), but is completely independent otherwise to reduce the chance of correlated bugs. The results of CADET have been checked with the proof checker; running times reported below are excluding the time to check the certificates.

So far, there is no standard benchmark for functional synthesis or quantifier elimination. Like previous works on functional synthesis, we resort to using the 2QBF benchmark from QBFEVAL'17 [14], and re-interpret them as functional synthesis problems. The 2QBF benchmark from QBFEVAL'17 is a collection of 384 formulas from various domains, mostly from software verification, program synthesis, and logical equivalences [33–36].

We compare CADET to the most recent tools on functional synthesis, BaF-Syn [8] and BFSS [10], the latter of which has been shown to consistently outperform the earlier, BDD-based tools SSyft [25] and RSynth [26,27]. We ran CADET in two configurations: with (CADET+) and without (CADET) its CEGAR module [23]. We present the results as a cactus plot, which is obtained by running each tool on all formulas, sorting the running times for each tool separately. A point x, y in this plot means that x formulas were solved in less than time y. Note that the time axis is in log-scale (Fig. 1).

CADET shows a clear edge in performance: it is one to two orders of magnitude faster than its strongest competitor, BFSS, and can solve significantly more formulas. But despite the clear performance advantage in this aggregate view, BaFSyn and BFSS can be faster for individual formulas or subfamilies of QBFEval, as shown in previous works [8,10].

#### **6 The Current State of CADET**

Originally designed as an experimentation platform, CADET has grown to become a performant and versatile tool for the synthesis of Boolean functions. It consistently wins awards at the annual QBFEVAL competitions, and is the only such tool able to prove all its results [14].

CADET reads specifications in the QDIMACS and the QAIGER formats, and now supports the synthesis of Boolean functions for 2QBF, functional synthesis, and quantifier elimination with the command line options -c [file], -f [file], and -e [file]. The functions computed by CADET are much smaller compared to those found by CEGAR-based algorithms [22], and in its default configuration, CADET double-checks its results before reporting them. This can be deactivated by the flag --dontverify.

It has also been integrated in py-aiger [37], a Python package for the convenient handling of circuits due to Marcell Vazquez-Chanlatte, which enables us to easily model and prototype new approaches. For example, we can write:

```
import aiger analy si s as aa
import aigerbv as bv
x = bv . atom (32 , ' x ' ) # Create a 32 b i t var iable
y = bv . atom (32 , ' y ' )
expr = (x != y)
result = aa . eliminate ( expr , [ 'y ' ] )
assert aa . is equal (x, result )
```
CADET also has an experimental reinforcement learning interface that allows us to automatically learn decision heuristics with the help of graph neural networks. A recent effort shows that there is huge potential in learning better branching heuristics from scratch [38].

#### **7 Conclusions**

In this work, we extended ID with the ability to solve functional synthesis and quantifier elimination problems. The extension is very simple—we only need to add the clauses of the original formula to its conflict check. The resulting algorithm significantly outperforms previous algorithms for functional synthesis.

**Acknowledgements.** The author wants to thank to Shubham Goel, Shetal Shah, and Lucas Tabajara for insightful discussions and for their assistance with running their functional synthesis tools. In particular, I want to express my gratitude to Supratik Chakraborty for inspiring me to work on the topic in a discussion in the summer of 2016.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Numerical Programs

## **Loop Summarization with Rational Vector Addition Systems**

Jake Silverman(B) and Zachary Kincaid

Princeton University, Princeton, USA {Jakers,ZKincaid}@CS.Princeton.edu

**Abstract.** This paper presents a technique for computing numerical loop summaries. The method synthesizes a rational vector addition system with resets (Q-VASR) that simulates the action of an input loop, and then uses the reachability relation of that Q-VASR to over-approximate the behavior of the loop. The key technical problem solved in this paper is to automatically synthesize a Q-VASR that is a *best abstraction* of a given loop in the sense that (1) it simulates the loop and (2) it is simulated by any other Q-VASR that simulates the loop. Since our loop summarization scheme is based on computing the *exact* reachability relation of a *best* abstraction of a loop, we can make theoretical guarantees about its behavior. Moreover, we show experimentally that the technique is precise and performant in practice.

#### **1 Introduction**

Modern software verification techniques employ a number of heuristics for reasoning about loops. While these heuristics are often effective, they are unpredictable. For example, an abstract interpreter may fail to find the most precise invariant expressible in the language of its abstract domain due to imprecise widening, or a software-model checker might fail to terminate because it generates interpolants that are insufficiently general. This paper presents a loop summarization technique that is capable of generating loop invariants in an expressive and decidable language and provides theoretical guarantees about invariant quality.

The key idea behind our technique is to leverage reachability results of vector addition systems (VAS) for invariant generation. Vector addition systems are a class of infinite-state transition systems with decidable reachability, classically used as a model of parallel systems [12]. We consider a variation of VAS, *rational VAS with resets (* Q-VASR*)*, wherein there is a finite number of rational-typed variables and a finite set of transitions that simultaneously update each variable in the system by either adding a constant value or (re)setting the variable to a constant value. Our interest in Q-VASRs stems from the fact that there is (polytime) procedure to compute a linear arithmetic formula that represents a Q-VASR's reachability relation [8].

Since the reachability relation of a Q-VASR is computable, the dynamics of Q-VASR can be analyzed without relying on heuristic techniques. However, there is a gap between Q-VASR and the loops that we are interested in summarizing. The latter typically use a rich set of operations (memory manipulation, conditionals, non-constant increments, non-linear arithmetic, etc) and cannot be analyzed precisely. We bridge the gap with a procedure that, for any loop, synthesizes a Q-VASR that simulates it. The reachability relation of the Q-VASR can then be used to over-approximate the behavior of the loop. Moreover, we prove that if a loop is expressed in linear rational arithmetic (LRA), then our procedure synthesizes a *best* Q-VASR abstraction, in the sense that it simulates any other Q-VASR that simulates the loop. That is, imprecision in the analysis is due to inherent limitations of the Q-VASR model, rather heuristic algorithmic choices.

One limitation of the model is that Q-VASRs over-approximate multi-path loops by treating the choice between paths as non-deterministic. We show that Q-VASRS, Q-VASR extended with control states, can be used to improve our invariant generation scheme by encoding control flow information and interpath control dependencies that are lost in the Q-VASR abstraction. We give an algorithm for synthesizing a Q-VASRS abstraction of a given loop, which (like our Q-VASR abstraction algorithm) synthesizes *best* abstractions under certain assumptions.

Finally, we note that our analysis techniques extend to complex control structures (such as nested loops) by employing summarization compositionally (i.e., "bottom-up"). For example, our analysis summarizes a nested loop by first summarizing its inner loops, and then uses the summaries to analyze the outer loop. As a result of compositionality, our analysis can be applied to partial programs, is easy to parallelize, and has the potential to scale to large code bases.

The main contributions of the paper are as follows:


#### **1.1 Outline**

This section illustrates the high-level structure of our invariant generation scheme. The goal is to compute a *transition formula* that summarizes the behavior of a given program. A transition formula is a formula over a set of program variables Var along with primed copies Var , representing the state of the program

```
procedure enqueue(elt):
  back := cons(elt,back)
  size := size + 1
procedure dequeue():
  if (front == nil) then
    // Reverse back, append to front
    while (back != nil) do
      front := cons(head(back),front)
      back := tail(back)
  result := head(front)
  front := tail(front)
  size := size - 1
  return result
           (a) Persistent queue
                                          procedure enqueue():
                                            back len := back len + 1
                                            mem ops := mem ops + 1
                                            size := size + 1
                                          procedure dequeue():
                                            if (front len == 0) then
                                              while (back len != 0) do
                                                 front len := front len + 1
                                                 back len := back len - 1
                                                 mem ops := mem ops + 3
                                            size := size - 1
                                            front len := front len - 1
                                            mem ops := mem ops + 2
                                          procedure harness():
                                            nb ops := 0
                                            while nondet() do
                                              nb ops := nb ops + 1
                                              if (size > 0 && nondet())
                                                 enqueue()
                                              else
                                                 dequeue()
                                                  (b) Integer model & harness
```
**Fig. 1.** A persistent queue and integer model. back len and front len models the lengths of the lists front and back; mem ops counts the number of memory operations in the computation.

before and after executing a computation (respectively). For any given program P, a transition formula **TF**-P can be computed by recursion on syntax:<sup>1</sup>

$$\begin{array}{c} \mathtt{TF}[\mathtt{x} := e] \triangleq \mathtt{x}' = e \land \bigwedge\_{\mathtt{y} \neq \mathtt{x} \in \mathtt{Var}} \mathtt{y}' = \mathtt{y} \\\\ \mathtt{TF}[\mathtt{if} \ c \ \mathtt{then} \ P\_{1} \ \mathtt{else} \ P\_{2}] \stackrel{\scriptstyle \triangleq}{=} \left( c \land \mathtt{TF}[P\_{1}] \right) \vee \left( \neg c \land \mathtt{TF}[P\_{2}] \right) \\\\ \mathtt{TF}[P\_{1}; P\_{2}] \stackrel{\scriptstyle \triangleq}{=} \exists X \in \mathtt{Z}. \mathtt{TF}[P\_{1}][\mathtt{Var}' \mapsto X] \land \mathtt{TF}[P\_{2}][\mathtt{Var} \mapsto X] \\\\ \mathtt{TF}[\mathtt{while} \ c \ \mathtt{do} \ P] \stackrel{\scriptstyle \triangleq}{=} \left( c \land \mathtt{TF}[P] \right) \stackrel{\scriptstyle \triangleq}{\land} \land \left( \neg c[\mathtt{Var} \mapsto \mathtt{Var}'] \right) \end{array}$$

where (−) is a function that computes an over-approximation of the transitive closure of a transition formula. The contribution of this paper is a method for computing this (−) operation, which is based on first over-approximating the input transition formula by a Q-VASR, and then computing the (exact) reachability relation of the Q-VASR.

<sup>1</sup> This style of analysis can be extended from a simple block-structured language to one with control flow and recursive procedures using the framework of algebraic program analysis [13,23].

We illustrate the analysis on an integer model of a persistent queue data structure, pictured in Fig. 1. The example consists of two operations (enqueue and dequeue), as well as a test harness (harness) that non-deterministically executes enqueue and dequeue operations. The queue achieves O(1) amortized memory operations (mem ops) in enqueue and queue by implementing the queue as two lists, front and back (whose lengths are modeled as front len and back len, respectively): the sequence of elements in the queue is the front list followed by the reverse of the back list. We will show that the queue functions use O(1) amortized memory operations by finding a summary for harness that implies a linear bound on mem ops (the number of memory operations in the computation) in terms of nb ops (the total number of enqueue/dequeue operations executed in some sequence of operations).

We analyze the queue compositionally, in "bottom-up" fashion (i.e., starting from deeply-nested code and working our way back up to a summary for harness). There are two loops of interest, one in dequeue and one in harness. Since the dequeue loop is nested inside the harness loop, dequeue is analyzed first. We start by computing a transition formula that represents one execution of the body of the dequeue loop:

$$\mathbf{b} \mathbf{b} \mathbf{y\_{\mathsf{dq}}} = \mathbf{b} \mathbf{c} \mathbf{k} \mathbf{1} \mathbf{n} > 0 \land \begin{pmatrix} \mathbf{f} \mathbf{n} \mathbf{t} \texttt{.1en}' = \mathbf{f} \texttt{n} \texttt{.1en} + 1 \\ \wedge \texttt{back\\_len'} = \mathbf{back\\_len} - 1 \\ \wedge \texttt{mem\\_ops'} = \texttt{mem\\_ops} + 3 \\ \wedge \texttt{size'} = \texttt{size} \end{pmatrix}$$

Observe that each variable in the loop is incremented by a constant value. As a result, the loop update can be captured faithfully by a vector addition system. In particular, we see that this loop body formula is simulated by the Q-VASR Vdeq (below), where the correspondence between the state-space of *Body*deq and Vdeq is given by the identity transformation (i.e., each dimension of Vdeq simply represents one of the variables of *Body*deq).

$$
\begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} = \begin{bmatrix} 1 \ 0 \ 0 \ 0 \\ 0 \ 1 \ 0 \ 0 \\ 0 \ 0 \ 1 \ 0 \\ 0 \ 0 \ 0 \ 1 \end{bmatrix} \begin{bmatrix} \mathbf{f} \mathbf{n} \ \mathbf{n} \ \mathbf{l} \ \mathbf{n} \\ \mathbf{b} \ \mathbf{a} \mathbf{k} \ \mathbf{l} \ \mathbf{n} \\ \mathbf{m} \mathbf{m} \ \mathbf{o} \ \mathbf{s} \\ \mathbf{s} \ \mathbf{i} \mathbf{z} \end{bmatrix}; \quad V\_{\mathbf{d} \mathbf{a} \mathbf{q}} = \left\{ \begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} \to \begin{bmatrix} w+1 \\ x-1 \\ y+3 \\ z \end{bmatrix} \right\} \dots
$$

A formula representing the reachability relation of a vector addition system can be computed in polytime. For the case of Vdeq, a formula representing k steps of the Q-VASR is simply

$$w' = w + k \wedge x' = x - k \wedge y' = y + 3k \wedge z' = z. \tag{\dagger}$$

To capture information about the pre-condition of the loop, we can project the primed variables to obtain back len > 0; similarly, for the post-condition, we can project the unprimed variables to obtain back len ≥ 0. Finally, combining (†) (translated back into the vocabulary of the program) and the pre/post-condition, we form the following approximation of the dequeue loop's behavior:

$$\exists k.k \ge 0 \land \begin{pmatrix} \mathtt{front\\_len'} = \mathtt{front\\_len} + k\\ \land \mathtt{back\\_len'} = \mathtt{back\\_len} - k\\ \land \mathtt{mem\\_ops'} = \mathtt{mem\\_ops} + 3k\\ \land \mathtt{size'} = \mathtt{size} \end{pmatrix} \land \begin{pmatrix} k > 0 \Rightarrow \begin{pmatrix} \mathtt{back\\_len} > 0\\ \land \mathtt{back\\_len'} \ge 0 \end{pmatrix} \end{pmatrix} \land$$

Using this summary for the dequeue loop, we proceed to compute a transition formula for the body of the harness loop (omitted for brevity). Just as with the dequeue loop, we analyze the harness loop by synthesizing a Q-VASR that simulates it, Vhar (below), where the correspondence between the state space of the harness loop and Vhar is given by the transformation Shar:

$$V\_{\text{har}} = \underbrace{\begin{bmatrix} v \\ w \\ x \\ y \\ z \end{bmatrix}}\_{\text{\(\text{\(\(0,0,1]\)}\)}} = \underbrace{\begin{bmatrix} 0 \,0 \,0 \,1 \,0 \\ 0 \,1 \,0 \,0 \\ 0 \,3 \,1 \,0 \\ 0 \,0 \,0 \,1 \end{bmatrix}}\_{\text{\(\(0,0,0]\)}} \begin{cases} \text{\(\text{size} = v \\ \text{back\\_len} = w \\ \text{name\\_ops} \\ \text{\(\(\text{\(\"base\\_lens} + \text{?\(\(0,1]\)}) \\ \text{\(\(\text{\(\(0,1]\)}\)} \text{\(\(\)} + \text{?\(\(0,1]\)}) \end{cases})}\_{\text{\(\(\text{\(\"blue\\_lens} + \text{?\(\(0,1]\)}) \\ \text{\(\(\(0,0,1]\)}\)} \text{\(\(\(0,0,1]\)} \text{\(\(\)} + \text{?\(\(0,1]\)}) \text{\(\(\)} + \text{?\(\(0,1]\)} \text{\(\(0,0,1]\)} \text{\(\(\)} + \text{?\(\(0,1]\)} \text{\(\(0,1]\)} \\ \text{\(\(\(0,1]\)}\)} \text{\(\(\(0,1]\) + \text{?\(\(0,1]\)} \text{\(\(0,1]\)} \text{\(\(0,1]\)} \text{\(\(0,1]\)} \text{\(\(0,1]\)} \text{\(\(0,1]\)} \end{cases}$$

$$V\_{\text{har}} = \left\{ \left[\begin{array}{c} v \\ w \\ x \\ y \\ z \right] \text{\(\(0,1]\)} \text{\(\(0,1]\)} \text{\(\(0,1]\)} \right] + \left[$$

Unlike the dequeue loop, we do not get an exact characterization of the dynamics of each changed variable. In particular, in the slow dequeue path through the loop, the value of front len, back len, and mem ops change by a variable amount. Since back len is set to 0, its behavior can be captured by a reset. The dynamics of front len and mem ops cannot be captured by a Q-VASR, but (using our dequeue summary) we can observe that the sum of front len + back len is decremented by 1, and the sum of mem ops + 3back len is incremented by 2.

We compute the following formula that captures the reachability relation of Vhar (taking k<sup>1</sup> steps of enqueue, k<sup>2</sup> steps of dequeue fast, and k<sup>3</sup> steps of dequeue slow) under the inverse image of the state correspondence Shar:

⎛ ⎜⎜⎜⎜⎝ size <sup>=</sup> size <sup>+</sup> <sup>k</sup><sup>1</sup> <sup>−</sup> <sup>k</sup><sup>2</sup> <sup>−</sup> <sup>k</sup><sup>3</sup> <sup>∧</sup>((k<sup>3</sup> = 0 <sup>∧</sup> back len <sup>=</sup> back len <sup>+</sup> <sup>k</sup>1) <sup>∨</sup> (k<sup>3</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>0</sup> <sup>≤</sup> back len <sup>≤</sup> <sup>k</sup>1)) <sup>∧</sup>mem ops + 3back len <sup>=</sup> mem ops + 3back len + 4k<sup>1</sup> + 2k<sup>2</sup> + 2k<sup>3</sup> <sup>∧</sup>front len <sup>+</sup> back len <sup>=</sup> front len <sup>+</sup> back len <sup>+</sup> <sup>k</sup><sup>1</sup> <sup>−</sup> <sup>k</sup><sup>2</sup> <sup>−</sup> <sup>k</sup><sup>3</sup> <sup>∧</sup>nb ops <sup>=</sup> nb ops <sup>+</sup> <sup>k</sup><sup>1</sup> <sup>+</sup> <sup>k</sup><sup>2</sup> <sup>+</sup> <sup>k</sup><sup>3</sup> ⎞ ⎟⎟⎟⎟⎠

From the above formula (along with pre/post-condition formulas), we obtain a summary for the harness loop (omitted for brevity). Using this summary we can prove (supposing that we start in a state where all variables are zero) that mem ops is at most 4 times nb ops (i.e., enqueue and dequeue use O(1) amortized memory operations).

#### **2 Background**

The syntax of ∃LIRA, the existential fragment of linear integer/rational arithmetic, is given by the following grammar:

$$\begin{aligned} &s, t \in \mathsf{Term} ::= c \mid x \mid s + t \mid c \cdot t \\ &F, G \in \mathsf{Form} \mathsf{ula} ::= s < t \mid s = t \mid F \wedge G \mid F \vee G \mid \exists x \in \mathbb{Q}. F \mid \exists x \in \mathbb{Z}. F \end{aligned}$$

where x is a (rational sorted) variable symbol and c is a rational constant. Observe that (without loss of generality) formulas are free of negation. ∃LRA (linear rational arithmetic) refers to the fragment of ∃LIRA that omits quantification over the integer sort.

<sup>A</sup> **transition system** is a pair (S,→) where <sup>S</sup> is a (potentially infinite) set of states and →⊆ <sup>S</sup> <sup>×</sup> <sup>S</sup> is a transition relation. For a transition relation <sup>→</sup>, we use →<sup>∗</sup> to denote its reflexive, transitive closure.

A **transition formula** is a formula F(**x**, **x** ) whose free variables range over **x** = x1, ..., x<sup>n</sup> and **x** = x 1, ..., x <sup>n</sup> (we refer to the number n as the *dimension* of F); these variables designate the state before and after a transition. In the following, we assume that transition formulas are defined over ∃LIRA. For a transition formula F(**x**, **x** ) and vectors of terms **s** and **t**, we use F(**s**, **t**) to denote the formula F with each x<sup>i</sup> replaced by s<sup>i</sup> and each x <sup>i</sup> replaced by ti. A transition formula F(**x**, **x** ) defines a transition system (S<sup>F</sup> ,→<sup>F</sup> ), where the state space <sup>S</sup><sup>F</sup> is <sup>Q</sup><sup>n</sup> and which can transition **<sup>u</sup>** <sup>→</sup><sup>F</sup> **<sup>v</sup>** iff <sup>F</sup>(**u**, **<sup>v</sup>**) is valid.

For two rational vectors **<sup>a</sup>** and **<sup>b</sup>** of the same dimension <sup>d</sup>, we use **<sup>a</sup>** · **<sup>b</sup>** to denote the inner product **<sup>a</sup>**·**<sup>b</sup>** <sup>=</sup> <sup>d</sup> <sup>i</sup>=1 <sup>a</sup>ib<sup>i</sup> and **<sup>a</sup>**∗**<sup>b</sup>** to denote the pointwise (aka Hadamard) product (**a**∗**b**)<sup>i</sup> <sup>=</sup> <sup>a</sup>ibi. For any natural number <sup>i</sup>, we use **<sup>e</sup>**<sup>i</sup> to denote the standard basis vector in the ith direction (i.e., the vector consisting of all zeros except the ith entry, which is 1), where the dimension of **e**<sup>i</sup> is understood from context. We use <sup>I</sup><sup>n</sup> to denote the <sup>n</sup> <sup>×</sup> <sup>n</sup> identity matrix.

**Definition 1.** *A rational vector addition system with resets (*Q*-VASR) of dimension* <sup>d</sup> *is a finite set* <sup>V</sup> ⊆ {0, <sup>1</sup>}<sup>d</sup>×Q<sup>d</sup> *of transformers. Each transformer* (**r**, **<sup>a</sup>**) <sup>∈</sup> <sup>V</sup> *consists of a binary reset vector* **<sup>r</sup>***, and a rational addition vector* **<sup>a</sup>***, both of dimension* <sup>d</sup>*.* <sup>V</sup> *defines a transition system* (S<sup>V</sup> ,→<sup>V</sup> )*, where the state space* <sup>S</sup><sup>V</sup> *is* <sup>Q</sup><sup>d</sup> *and which can transition* **<sup>u</sup>** <sup>→</sup><sup>V</sup> **<sup>v</sup>** *iff* **<sup>v</sup>** <sup>=</sup> **<sup>r</sup>** <sup>∗</sup> **<sup>u</sup>** <sup>+</sup> **<sup>a</sup>** *for some* (**r**, **<sup>a</sup>**) <sup>∈</sup> <sup>V</sup> *.*

**Definition 2.** *A rational vector addition system with resets and states (*Q*-VASRS) of dimension* <sup>d</sup> *is a pair* <sup>V</sup> = (Q, E)*, where* <sup>Q</sup> *is a finite set of control states, and* <sup>E</sup> <sup>⊆</sup> <sup>Q</sup> × {0, <sup>1</sup>}<sup>d</sup> <sup>×</sup> <sup>Q</sup><sup>d</sup> <sup>×</sup> <sup>Q</sup> *is a finite set of edges labeled by (*d*-dimensional) transformers.* <sup>V</sup> *defines a transition system* (S<sup>V</sup> ,→<sup>V</sup> )*, where the state space* <sup>S</sup><sup>V</sup> *is* <sup>Q</sup>×Q<sup>n</sup> *and which can transition* (q1, **<sup>u</sup>**) <sup>→</sup><sup>V</sup> (q2, **<sup>v</sup>**) *iff there is some edge* (q1,(**r**, **<sup>a</sup>**), q2) <sup>∈</sup> <sup>E</sup> *such that* **<sup>v</sup>** <sup>=</sup> **<sup>r</sup>** <sup>∗</sup> **<sup>u</sup>** <sup>+</sup> **<sup>a</sup>***.*

Our invariant generation scheme is based on the following result, which is a simple consequence of the work of Haase and Halfon:

**Theorem 1 (**[8]**).** *There is a polytime algorithm which, given a* d*-dimensional* <sup>Q</sup>*-VASRS* <sup>V</sup> = (Q, E)*, computes an* <sup>∃</sup>*LIRA transition formula reach*(V) *such that for all* **<sup>u</sup>**, **<sup>v</sup>** <sup>∈</sup> <sup>Q</sup>d*, we have* (p, **<sup>u</sup>**) <sup>→</sup><sup>∗</sup> <sup>V</sup> (q, **<sup>v</sup>**) *for some control states* p, q <sup>∈</sup> <sup>Q</sup> *if and only if* **u** →*reach*(V) **v***.*

Note that Q-VASR can be realized as Q-VASRS with a single control state, so this theorem also applies to Q-VASR.

#### **3 Approximating Loops with Vector Addition Systems**

In this section, we describe a method for over-approximating the transitive closure of a transition formula using a Q-VASR. This procedure immediately extends to computing summaries for programs (including programs with nested loops) using the method outlined in Sect. 1.1.

The core algorithmic problem that we answer in this section is: *given a transition formula, how can we synthesize a (best) abstraction of that formula's dynamics as a* Q-VASR*?* We begin by formalizing the problem: in particular, we define what it means for a Q-VASR to simulate a transition formula and what it means for an abstraction to be "best."

**Definition 3.** *Let* <sup>A</sup> = (Q<sup>n</sup>,→<sup>A</sup>) *and* <sup>B</sup> = (Q<sup>m</sup>,→<sup>B</sup>) *be transition systems operating over rational vector spaces. A linear simulation from* A *to* B *is a linear transformation* <sup>S</sup> : <sup>Q</sup><sup>m</sup>×<sup>n</sup> *such that for all* **<sup>u</sup>**, **<sup>v</sup>** <sup>∈</sup> <sup>Q</sup><sup>n</sup> *for which* **<sup>u</sup>** <sup>→</sup><sup>A</sup> **<sup>v</sup>***, we have* <sup>S</sup>**<sup>u</sup>** <sup>→</sup><sup>B</sup> <sup>S</sup>**v***. We use* <sup>A</sup> <sup>S</sup> <sup>B</sup> *to denote that* <sup>S</sup> *is a linear simulation from* A *to* B*.*

Suppose that F(**x**, **x** ) is an n-dimensional transition formula, V is a ddimensional Q-VASR, and S : Q<sup>d</sup>×<sup>n</sup> is linear transformation. The key property of simulations that underlies our loop summarization scheme is that if F <sup>S</sup> V , then *reach*(V )(S**x**, S**x** ) (i.e., the reachability relation of V under the inverse image of S) over-approximates the transitive closure of F. Finally, we observe that simulation F <sup>S</sup> V can equivalently be defined by the validity of the entailment <sup>F</sup> <sup>|</sup><sup>=</sup> <sup>γ</sup>(S, V ), where

$$\gamma(S, V) \stackrel{\triangle}{=} \bigvee\_{(\mathbf{r}, \mathbf{a}) \in V} S\mathbf{x}' = \mathbf{r} \* S\mathbf{x} + \mathbf{a}'$$

is a transition formula that represents the transitions that V simulates under transformation S.

Our task is to synthesize a linear transformation S and a Q-VASR V such that <sup>F</sup> <sup>S</sup> <sup>V</sup> . We call a pair (S, V ), consisting of a rational matrix <sup>S</sup> <sup>∈</sup> <sup>Q</sup><sup>d</sup>×<sup>n</sup> and a d-dimensional Q-VASR V , a Q-**VASR abstraction**. We say that n is the *concrete dimension* of (S, V ) and d is the *abstract dimension*. If F <sup>S</sup> V , then we say that (S, V ) is a Q-**VASR abstraction of** F. A transition formula may have many Q-VASR abstractions; we are interested in computing a Q-VASR abstraction (S, V ) that results in the most precise over-approximation of the transitive closure of <sup>F</sup>. Towards this end, we define a preorder  on <sup>Q</sup>-VASR abstractions, where (S1, V <sup>1</sup>)  (S2, V <sup>2</sup>) iff there exists a linear transformation <sup>T</sup> <sup>∈</sup> <sup>Q</sup>e×<sup>d</sup> such that <sup>V</sup> <sup>1</sup> <sup>T</sup> <sup>V</sup> <sup>2</sup> and T S<sup>1</sup> <sup>=</sup> <sup>S</sup><sup>2</sup> (where <sup>d</sup> and <sup>e</sup> are the abstract dimensions of (S1, V <sup>1</sup>) and (S2, V <sup>2</sup>), respectively). Observe that if (S1, V <sup>1</sup>)  (S2, V <sup>2</sup>), then *reach*(V <sup>1</sup>)(S<sup>1</sup>**x**, S<sup>1</sup>**x** ) <sup>|</sup><sup>=</sup> *reach*(<sup>V</sup> <sup>2</sup>)(S<sup>2</sup>**x**, S<sup>2</sup>**x** ).

Thus, our problem can be stated as follows: given a transition formula F, synthesize a Q-VASR abstraction (S, V ) of F such that (S, V ) is *best* in the sense that we have (S, V )  (S, <sup>V</sup>) for any <sup>Q</sup>-VASR abstraction (S, <sup>V</sup>) of <sup>F</sup>. A solution to this problem is given in Algorithm 1.


Algorithm 1 follows the familiar pattern of an AllSat-style loop. The algorithm takes as input a transition formula F. It maintains a Q-VASR abstraction (S, V ) and a formula Γ, whose models correspond to the transitions of F that are *not* simulated by (S, V ). The idea is to build (S, V ) iteratively by sampling transitions from Γ, augmenting (S, V ) to simulate the sample transition, and then updating <sup>Γ</sup> accordingly. We initialize (S, V ) to be (In, <sup>∅</sup>), the canonical least <sup>Q</sup>-VASR abstraction in  order, and <sup>Γ</sup> to be <sup>F</sup> (i.e., (In, <sup>∅</sup>) does not simulate any transitions of F). Each loop iteration proceeds as follows. First, we sample a model M of Γ (i.e., a transition that is allowed by F but not simulated by (S, V )). We then generalize that transition to a set of transitions by using M to select a cube C of the DNF of F that contains M. Next, we use the procedure described in Sect. 3.1 to compute a Q-VASR abstraction ˆα(C) that simulates the transitions of C. We then update the Q-VASR abstraction (S, V ) to be the least upper bound of (S, V ) and ˆα(C) (w.r.t.  order) using the procedure described in Sect. 3.2 (line 7). Finally, we block any transition simulated by the least upper bound (including every transition in C) from being sampled again by conjoining <sup>¬</sup>γ(S, V ) to <sup>Γ</sup>. The loop terminates when <sup>Γ</sup> is unsatisfiable, in which case we have that F <sup>S</sup> V . Theorem 2 gives the correctness statement for this algorithm. **Theorem 2.** *Given a transition formula* F*, Algorithm 1 computes a simulation* <sup>S</sup> *and* <sup>Q</sup>*-VASR* <sup>V</sup> *such that* <sup>F</sup> <sup>S</sup> <sup>V</sup> *. Moreover, if* <sup>F</sup> *is in* <sup>∃</sup>*LRA, Algorithm <sup>1</sup> computes a* best Q*-VASR abstraction of* F*.*

The proof of this theorem as well as the proofs to all subsequent theorems, lemmas, and propositions are in the extended version of this paper [20].

#### **3.1 Abstracting Conjunctive Transition Formulas**

This section shows how to compute a Q-VASR abstraction for a consistent *conjunctive* formula. When the input formula is in <sup>∃</sup>LRA, the computed <sup>Q</sup>-VASR abstraction will be a best Q-VASR abstraction of the input formula. The intuition is that, since <sup>∃</sup>LRA is a convex theory, a best <sup>Q</sup>-VASR abstraction consists of a single transition. For <sup>∃</sup>LIRA formulas, our procedure produces a <sup>Q</sup>-VASR abstract that is not guaranteed to be best, precisely because ∃LIRA is not convex.

Let C be consistent, conjunctive transition formula. Observe that the set *Res*<sup>C</sup> - {**s**, a : <sup>C</sup> <sup>|</sup><sup>=</sup> **<sup>s</sup>** · **<sup>x</sup>** <sup>=</sup> <sup>a</sup>}, which represents linear combinations of variables that are *reset* across C, forms a vector space. Similarly, the set *Inc*<sup>C</sup> = {**s**, a : <sup>C</sup> <sup>|</sup><sup>=</sup> **<sup>s</sup>** · **<sup>x</sup>** <sup>=</sup> **<sup>s</sup>** · **<sup>x</sup>** <sup>+</sup> <sup>a</sup>}, which represents linear combinations of variables that are *incremented* across C, forms a vector space. We compute bases for both *Res*<sup>C</sup> and *Inc*<sup>C</sup> , say {**s**1, a1, ...,**s**m, a<sup>m</sup>} and {**s**<sup>m</sup>+1, a<sup>m</sup>+1, ...,**s**d, a<sup>d</sup>}, respectively. We define ˆα(C) to be the Q-VASR abstraction ˆα(C) - (S, {(**r**, **<sup>a</sup>**)}), where

$$S \stackrel{\scriptstyle \Delta}{=} \begin{bmatrix} \mathbf{s}\_1 \\ \vdots \\ \mathbf{s}\_d \end{bmatrix} \quad \mathbf{r} \stackrel{\scriptstyle \Delta}{=} [\underbrace{0 \cdots 0}\_{m \text{ times}} \overbrace{1 \cdots 1}^{(d-m)\text{ times}}] \quad \mathbf{a} \stackrel{\scriptstyle \Delta}{=} \begin{bmatrix} a\_1 \\ \vdots \\ a\_d \end{bmatrix} .$$

*Example 1.* Let <sup>C</sup> be the formula <sup>x</sup> <sup>=</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> = 2<sup>y</sup> <sup>∧</sup> <sup>w</sup> <sup>=</sup> <sup>w</sup> <sup>∧</sup> <sup>w</sup> <sup>=</sup> <sup>w</sup> + 1 <sup>∧</sup> <sup>z</sup> <sup>=</sup> <sup>w</sup>. The vector space of resets has basis { 0 0 −1 1 , <sup>0</sup>} (representing that <sup>z</sup> <sup>−</sup> <sup>w</sup> is reset to 0). The vector space of increments has basis { 1 −100 , <sup>0</sup>, 0010 , <sup>0</sup>, 0 0 −1 1 , <sup>1</sup>} (representing that the difference <sup>x</sup> <sup>−</sup> <sup>y</sup> does not change, the difference <sup>z</sup> <sup>−</sup> <sup>w</sup> increases by 1, and the variable <sup>w</sup> does not change). A best abstraction of C is thus the four-dimensional Q-VASR

$$V = \left\{ \left( \begin{bmatrix} 0 \\ 1 \\ 1 \\ 1 \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix} \right) \right\}, S = \begin{bmatrix} 0 & 0 & -1 \ 1 \\ 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & -1 & 1 \end{bmatrix}.$$

In particular, notice that since the term <sup>z</sup> <sup>−</sup> <sup>w</sup> is both incremented and reset, it is represented by two different dimensions in ˆα(C).

**Proposition 1.** *For any consistent, conjunctive transition formula* C*,* αˆ(C) *is <sup>a</sup>* <sup>Q</sup>*-VASR abstraction of* <sup>C</sup>*. If* <sup>C</sup> *is expressed in* <sup>∃</sup>*LRA, then* <sup>α</sup>ˆ(C) *is best.*

#### **3.2 Computing Least Upper Bounds**

This section shows how to compute least upper bounds w.r.t. the  order.

By definition of the  order, if (S, V ) is an upper bound of (S1, V <sup>1</sup>) and (S2, V <sup>2</sup>), then there must exist matrices T<sup>1</sup> and T<sup>2</sup> such that T1S<sup>1</sup> = S = T2S<sup>2</sup>, V <sup>1</sup> <sup>T</sup> <sup>1</sup> V , and V <sup>2</sup> <sup>T</sup> <sup>2</sup> V . As we shall see, if (S, V ) is a *least* upper bound, then it is completely determined by the matrices T<sup>1</sup> and T<sup>2</sup>. Thus, we shift our attention to computing simulation matrices T<sup>1</sup> and T<sup>2</sup> that induce a least upper bound.

In view of the desired equation T1S<sup>1</sup> = S = T2S<sup>2</sup>, let us consider the constraint T1S<sup>1</sup> = T2S<sup>2</sup> on two *unknown* matrices T<sup>1</sup> and T<sup>2</sup>. Clearly, we have T1S<sup>1</sup> = T2S<sup>2</sup> iff each (T<sup>1</sup> <sup>i</sup> , T<sup>2</sup> <sup>i</sup> ) belongs to the set T - {(**t**<sup>1</sup>, **<sup>t</sup>**<sup>2</sup>) : **<sup>t</sup>**<sup>1</sup>S<sup>1</sup> <sup>=</sup> **<sup>t</sup>**<sup>2</sup>S<sup>2</sup>}. Observe that T is a vector space, so there is a *best* solution to the constraint T1S<sup>1</sup> = T2S<sup>2</sup>: choose T<sup>1</sup> and T<sup>2</sup> so that the set of all row pairs (T<sup>1</sup> <sup>i</sup> , T<sup>2</sup> <sup>i</sup> ) forms a basis for <sup>T</sup> . In the following, we use *pushout*(S1, S<sup>2</sup>) to denote a function that computes such a *best* (T1, T<sup>2</sup>).

While *pushout* gives a *best* solution to the equation T1S<sup>1</sup> = T2S<sup>2</sup>, it is not sufficient for the purpose of computing least upper bounds for Q-VASR abstractions, because T<sup>1</sup> and T<sup>2</sup> may not respect the structure of the Q-VASR V <sup>1</sup> and V <sup>2</sup> (i.e., there may be no Q-VASR V such that V <sup>1</sup> <sup>T</sup> <sup>1</sup> V and V <sup>2</sup> <sup>T</sup> <sup>2</sup> V ). Thus, we must further constrain our problem by requiring that T<sup>1</sup> and T<sup>2</sup> are *coherent* with respect to V <sup>1</sup> and V <sup>2</sup> (respectively).

**Definition 4.** *Let* <sup>V</sup> *be a* <sup>d</sup>*-dimensional* <sup>Q</sup>*-VASR. We say that* i, j ∈ {1, ..., d} *are coherent dimensions of* <sup>V</sup> *if for all transitions* (**r**, **<sup>a</sup>**) <sup>∈</sup> <sup>V</sup> *we have* <sup>r</sup><sup>i</sup> <sup>=</sup> <sup>r</sup><sup>j</sup> *(i.e., every transition of* V *that resets* i *also resets* j *and vice versa). We denote that* <sup>i</sup> *and* <sup>j</sup> *are coherent dimensions of* <sup>V</sup> *by writing* <sup>i</sup> <sup>≡</sup><sup>V</sup> <sup>j</sup>*, and observe that* <sup>≡</sup><sup>V</sup> *forms an equivalence relation on* {1, ..., d}*. We refer to the equivalence classes of* <sup>≡</sup><sup>V</sup> *as the coherence classes of* <sup>V</sup> *.*

*A matrix* <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>e</sup>×<sup>d</sup> *is coherent with respect to* <sup>V</sup> *if and only if each of its rows have non-zero values only in the dimensions corresponding to a single coherence class of* V *.*

For any <sup>d</sup>-dimensional <sup>Q</sup>-VASR <sup>V</sup> and coherence class <sup>C</sup> <sup>=</sup> {c1, ..., c<sup>k</sup>} of <sup>V</sup> , define <sup>Π</sup><sup>C</sup> to be the <sup>k</sup> <sup>×</sup> <sup>d</sup> dimensional matrix whose rows are **<sup>e</sup>**<sup>c</sup><sup>1</sup> , ..., **<sup>e</sup>**<sup>c</sup>*<sup>k</sup>* . Intuitively, Π<sup>C</sup> is a projection onto the set of dimensions in C.

Coherence is a necessary and sufficient condition for linear simulations between Q-VASR in a sense described in Lemmas 1 and 2.

**Lemma 1.** *Let* V <sup>1</sup> *and* V <sup>2</sup> *be* Q*-VASR (of dimension* d *and* e*, respectively), and let* <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>e</sup>×<sup>d</sup> *be a matrix such that* <sup>V</sup> <sup>1</sup> <sup>T</sup> <sup>V</sup> <sup>2</sup>*. Then* <sup>T</sup> *must be coherent with respect to* V <sup>1</sup>*.*

Let <sup>V</sup> be a <sup>d</sup>-dimensional <sup>Q</sup>-VASR and let <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>e</sup>×<sup>d</sup> be a matrix that is coherent with respect to V and has no zero rows. Then there is a (unique) edimensional <sup>Q</sup>-VASR *image*(V,T) such that its transition relation <sup>→</sup>*image*(V ,T)

**Algorithm 2.** (S1, V <sup>1</sup>) (S2, V <sup>2</sup>)

**input :** Normal Q-VASR abstractions (S<sup>1</sup>, V <sup>1</sup>) and (S<sup>2</sup>, V <sup>2</sup>) of equal concrete dimension **output:** Least upper bound (w.r.t. ) of (S<sup>1</sup>, V <sup>2</sup>) and (S<sup>1</sup>, V <sup>2</sup>) **<sup>1</sup>** S, T <sup>1</sup>, T <sup>2</sup> <sup>←</sup> empty matrices; **<sup>2</sup> foreach** *coherence class* C<sup>1</sup> *of* V <sup>1</sup> **do <sup>3</sup> foreach** *coherence class* C<sup>2</sup> *of* V <sup>2</sup> **do <sup>4</sup>** (U<sup>1</sup>, U<sup>2</sup>) <sup>←</sup> *pushout*(Π*C*<sup>1</sup> <sup>S</sup><sup>1</sup>, Π*C*<sup>2</sup> <sup>S</sup><sup>2</sup>); **<sup>5</sup>** S ← S U<sup>1</sup>Π*C*<sup>1</sup> S<sup>1</sup> ; <sup>T</sup> <sup>1</sup> <sup>←</sup> T <sup>1</sup> U<sup>1</sup>Π*C*<sup>1</sup> ; <sup>T</sup> <sup>2</sup> <sup>←</sup> T <sup>2</sup> U<sup>2</sup>Π*C*<sup>2</sup> ; **<sup>6</sup>** <sup>V</sup> <sup>←</sup> *image*(<sup>V</sup> <sup>1</sup>, T <sup>1</sup>) <sup>∪</sup> *image*(<sup>V</sup> <sup>2</sup>, T <sup>2</sup>); **7 return** (S, V )

is equal to {(T**u**, T**v**) : **<sup>u</sup>** <sup>→</sup><sup>V</sup> **<sup>v</sup>**} (the image of <sup>V</sup> 's transition relation under <sup>T</sup>). This Q-VASR can be defined by:

$$image(V, T) \triangleq \{ (T \boxtimes \mathbf{r}, T\mathbf{a}) : (\mathbf{r}, \mathbf{a}) \in V \}$$

where T **r** is the reset vector **r** translated along T (i.e., (T **r**)<sup>i</sup> = r<sup>j</sup> where j is an arbitrary choice among dimensions for which Tij is non-zero—at least one such j exists because the row T<sup>i</sup> is non-zero by assumption, and the choice of j is arbitrary because all such j belong to the same coherence class by the assumption that T is coherent with respect to V ).

**Lemma 2.** *Let* <sup>V</sup> *be a* <sup>d</sup>*-dimensional* <sup>Q</sup>*-VASR and let* <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>e</sup>×<sup>d</sup> *be a matrix that is coherent with respect to* V *and has no zero rows. Then the transition relation of image*(V,T) *is the image of* V *'s transition relation under* T *(i.e.,* <sup>→</sup>*image*(V ,T) *is equal to* {(T**u**, T**v**) : **<sup>u</sup>** <sup>→</sup><sup>V</sup> **<sup>v</sup>**}*).*

Finally, prior to describing our least upper bound algorithm, we must define a technical condition that is both assumed and preserved by the procedure:

**Definition 5.** *A* Q*-VASR abstraction* (S, V ) *is normal if there is no non-zero vector* **z** *that is coherent with respect to* V *such that* **z**S = 0 *(i.e., the rows of* S *that correspond to any coherence class of* V *are linearly independent).*

Intuitively, a Q-VASR abstraction that is *not* normal contains information that is either inconsistent or redundant.

We now present a strategy for computing least upper bounds of Q-VASR abstractions. Fix (normal) Q-VASR abstractions (S1, V <sup>1</sup>) and (S2, V <sup>2</sup>). Lemmas 1 and <sup>2</sup> together show that a pair of matrices <sup>T</sup><sup>1</sup> and <sup>T</sup><sup>2</sup> induce an upper bound (not necessarily *least*) on (S1, V <sup>1</sup>) and (S2, V <sup>2</sup>) exactly when the following conditions hold: (1) <sup>T</sup><sup>1</sup>S<sup>1</sup> <sup>=</sup> <sup>T</sup><sup>2</sup>S<sup>2</sup>, (2) <sup>T</sup><sup>1</sup> is coherent w.r.t. <sup>V</sup> <sup>1</sup>, (3) <sup>T</sup><sup>2</sup> is coherent w.r.t. <sup>V</sup> <sup>2</sup>, and (4) neither <sup>T</sup><sup>1</sup> nor <sup>T</sup><sup>2</sup> contain zero rows. The upper bound induced by <sup>T</sup><sup>1</sup> and <sup>T</sup><sup>2</sup> is given by

$$
ub(\widetilde{T}^1, \widetilde{T}^2) \triangleq (\widetilde{T}^1 S^1, image(V^1, \widetilde{T}^1) \cup image(V^2, T^2)).
$$

We now consider how to compute a *best* such <sup>T</sup><sup>1</sup> and <sup>T</sup><sup>2</sup>. Observe that conditions (1), (2), and (3) hold exactly when for each row <sup>i</sup>, (<sup>T</sup><sup>1</sup> <sup>i</sup> , <sup>T</sup><sup>2</sup> <sup>i</sup> ) belongs to the set

$$\mathcal{T} \triangleq \{ (\mathbf{t}^1, \mathbf{t}^2) : \mathbf{t}^1 S^1 = \mathbf{t}^2 S^2 \land \mathbf{t}^1 coherent \ w.r.t. \ V^1 \land \mathbf{t}^1 coherent \ w.r.t. \ V^2 \}.$$

Since a row vector **t**<sup>i</sup> is coherent w.r.t. V <sup>i</sup> iff its non-zero positions belong to the same coherence class of V <sup>i</sup> (equivalently, **t**<sup>i</sup> = **u**ΠC*<sup>i</sup>* for some coherence class <sup>C</sup><sup>i</sup> and vector **<sup>u</sup>**), we have <sup>T</sup> <sup>=</sup> ! <sup>C</sup>1,C<sup>2</sup> <sup>T</sup> (C1, C<sup>2</sup>), where the union is over all coherence classes C<sup>1</sup> of V <sup>1</sup> and C<sup>2</sup> of V <sup>2</sup>, and

$$\mathcal{T}(C^1, C^2) \triangleq \{ (\mathbf{u}^1 \boldsymbol{\Pi}\_{C^1}, \mathbf{u}^2 \boldsymbol{\Pi}\_{C^2}) : \mathbf{u}^1 \boldsymbol{\Pi}\_{C^1} S^1 = \mathbf{u}^2 \boldsymbol{\Pi}\_{C^2} S^2 \}.$$

Observe that each <sup>T</sup> (C1, C<sup>2</sup>) is a vector space, so we can compute a pair of matrices T<sup>1</sup> and T<sup>2</sup> such that the rows (T<sup>1</sup> <sup>i</sup> , T<sup>2</sup> <sup>i</sup> ) collectively form a basis for each <sup>T</sup> (C1, C<sup>2</sup>). Since (S1, V <sup>1</sup>) and (S2, V <sup>2</sup>) are normal (by assumption), neither <sup>T</sup><sup>1</sup> nor T<sup>2</sup> may contain zero rows (condition (4) is satisfied). Finally, we have that *ub*(T1, T<sup>2</sup>) is the *least* upper bound of (S1, V <sup>1</sup>) and (S2, V <sup>2</sup>). Algorithm 2 is a straightforward realization of this strategy.

**Proposition 2.** *Let* (S1, V <sup>1</sup>) *and* (S2, V <sup>2</sup>) *be normal* Q*-VASR abstractions of equal concrete dimension. Then the* Q*-VASR abstraction* (S, V ) *computed by Algorithm 2 is normal and is a least upper bound of* (S1, V <sup>2</sup>) *and* (S2, V <sup>2</sup>)*.*

#### **4 Control Flow and** Q**-VASRS**

In this section, we give a method for improving the precision of our loop summarization technique by using Q-VASRS; that is, Q-VASR extended with control states. While Q-VASRs over-approximate control flow using non-determinism, Q-VASRSs allow us to analyze phenomena such as oscillating and multi-phase loops.

We begin with an example that demonstrates the precision gained by Q-VASRS. The loop in Fig. 2a oscillates between (1) incrementing variable i by 1 and (2) incrementing both variables i and x by 1. Suppose that we wish to prove

**Fig. 2.** An oscillating loop and its representation as a Q-VASR and Q-VASRS.

that, starting with the configuration <sup>x</sup> = 0∧<sup>i</sup> = 1, the loop maintains the invariant that 2<sup>x</sup> <sup>≤</sup> <sup>i</sup>. The (best) <sup>Q</sup>-VASR abstraction of the loop, pictured in Fig. 2b, over-approximates the control flow of the loop by treating the conditional branch in the loop as a non-deterministic branch. This over-approximation may violate the invariant 2<sup>x</sup> <sup>≤</sup> <sup>i</sup> by repeatedly executing the path where both variables are incremented. On the other hand, the Q-VASRS abstraction of the loop pictured in Fig. 2c captures the understanding that the loop must oscillate between the two paths. The loop summary obtained from the reachability relation of this <sup>Q</sup>-VASRS is powerful enough to prove the invariant 2<sup>x</sup> <sup>≤</sup> <sup>i</sup> holds (under the precondition <sup>x</sup> = 0 <sup>∧</sup> <sup>i</sup> = 1).

#### **4.1 Technical Details**

In the following, we give a method for over-approximating the transitive closure of a transition formula F(**x**, **x** ) using a Q-VASRS. We start by defining *predicate* Q-VASRS, a variation of Q-VASRS with control states that correspond to disjoint state predicates (where the states intuitively belong to the transition formula F rather than the Q-VASRS itself). We extend linear simulations and best abstractions to predicate Q-VASRS, and give an algorithm for synthesizing best predicate Q-VASRS abstractions (for a given set of predicates). Finally, we give an end-to-end algorithm for over-approximating the transitive closure of a transition formula.

**Definition 6.** *<sup>A</sup> predicate* <sup>Q</sup>*-VASRS over* **<sup>x</sup>** *is a* <sup>Q</sup>*-VASRS* <sup>V</sup> = (P, E)*, such that each control state is a predicate over the variables* **x** *and the predicates in* <sup>P</sup> *are pairwise inconsistent (for all* <sup>p</sup> <sup>=</sup> <sup>q</sup> <sup>∈</sup> <sup>P</sup>*,* <sup>p</sup> <sup>∧</sup> <sup>q</sup> *is unsatisfiable).*

We extend linear simulations to predicate Q-VASRS as follows:


We define a <sup>Q</sup>-VASRS abstraction over **<sup>x</sup>** <sup>=</sup> <sup>x</sup>1, ..., x<sup>n</sup> to be a pair (S, <sup>V</sup>) consisting of a rational matrix <sup>S</sup> <sup>∈</sup> <sup>Q</sup><sup>d</sup>×<sup>n</sup> and a predicate <sup>Q</sup>-VASRS of dimension <sup>d</sup> over **<sup>x</sup>**. We extend the simulation preorder  to <sup>Q</sup>-VASRS abstractions in the natural way. Extending the definition of "best" abstractions requires more care, since we can always find a "better" <sup>Q</sup>-VASRS abstraction (strictly smaller in  order) by using a finer set of predicates. However, if we consider only predicate

#### **Algorithm 3.** abstract-VASRS(F, P)

**input :** Transition formula F(**x**, **x**- ), set of pairwise-disjoint predicates P over **x** such that for all **u**, **v** with **u** →*<sup>F</sup>* **v**, there exists p, q ∈ P with p(**u**) and q(**v**) both valid **output:** Best Q-VASRS abstraction of F with control states P **<sup>1</sup>** For all p, q <sup>∈</sup> <sup>P</sup>, let (S*p,q*, V*p,q*) <sup>←</sup> abstract-VASR(p(**x**) <sup>∧</sup> <sup>F</sup>(**x**, **<sup>x</sup>**- ) ∧ q(**x**- )); **<sup>2</sup>** (S, V ) ← least upper bound of all (S*p,q*, V*p,q*); **<sup>3</sup>** For all p, q ∈ P, let T*p,q* ← the simulation matrix from (S*p,q*, V*p,q*) to (S, V );

**<sup>4</sup>** E = {(p, **r**, **a**, q) : p, q ∈ P, (**r**, **a**) ∈ *image*(V*p,q*, T*p,q*)};

**5 return** (S, (P, E))

Q-VASRS that share the same set of control states, then best abstractions do exist and can be computed using Algorithm 3.

Algorithm <sup>3</sup> works as follows: first, for each pair of formulas p, q <sup>∈</sup> <sup>P</sup>, compute a best <sup>Q</sup>-VASR abstraction of the formula <sup>p</sup>(**x**) <sup>∧</sup> <sup>F</sup>(**x**, **<sup>x</sup>** ) <sup>∧</sup> <sup>q</sup>(**x** ) and call it (Sp,q, V p,q). (Sp,q, V p,q) over-approximates the transitions of F that begin in a program state satisfying p and end in a program state satisfying q. Second, we compute the least upper bound of all Q-VASR abstractions (Sp,q, V p,q) to get a Q-VASR abstraction (S, V ) for F. As a side-effect of the least upper bound computation, we obtain a linear simulation Tp,q from (Sp,q, Vp,q) to (S, V ) for each p, q. A best Q-VASRS abstraction of F(**x**, **x** ) with control states P has S as its simulation matrix and has the image of Vp,q under Tp,q as the edges from p to q.

**Proposition 3.** *Given an transition formula* F(**x**, **x** ) *and control states* P *over* **x***, Algorithm 3 computes the best predicate* Q*-VASRS abstraction of* F *with control states* P*.*

We now describe iter-VASRS (Algorithm 4), which uses Q-VASRS to overapproximate the transitive closure of transition formulas. Towards our goal of *predictable* program analysis, we desire the analysis to be *monotone* in the sense that if F and G are transition formulas such that F entails G, then iter-VASRS(F) entails iter-VASRS(G). A sufficient condition to guarantee monotonicity of the overall analysis is to require that the set of control states that we compute for F is at least as fine as the set of control states we compute for G. We can achieve this by making the set of control states P of input transition formula F(**x**, **x** ) equal to the set of connected regions of the topological closure of ∃**x** .F (lines 1–4). Note that this set of predicates may fail the contract of abstract-VASRS: there may exist a transition **u** →<sup>F</sup> **v** such that **<sup>v</sup>** |<sup>=</sup> " <sup>P</sup> (this occurs when there is a state of <sup>F</sup> with no outgoing transitions). As a result, (S, <sup>V</sup>) = abstract-VASRS(F, P) does not necessarily approximate <sup>F</sup>; however, it *does* over-approximate <sup>F</sup> <sup>∧</sup> " <sup>P</sup>(**x** ). An over-approximation of the transitive closure of <sup>F</sup> can easily be obtained from *reach*(V)(S**x**, S**x** ) (the over-approximation of the transitive closure of <sup>F</sup> <sup>∧</sup> " <sup>P</sup>(**x** ) obtained from the

<sup>Q</sup>-VASRS abstraction (S, <sup>V</sup>)) by sequentially composing with the disjunction of F and the identity relation (line 6).


*Precision Improvement.* The abstract-VASRS algorithm uses predicates to infer the control structure of a Q-VASRS, but after computing the Q-VASRS abstraction, iter-VASRS makes no further use of the predicates (i.e., the predicates are irrelevant in the computation of *reach*(V)). Predicates can be used to improve iter-VASRS as follows: the reachability relation of a Q-VASRS is expressed by a formula that uses auxiliary variables to represent the state at which the computation begins and ends [8]. These variables can be used to encode that the pre-state of the transitive closure must satisfy the predicate corresponding to the begin state and the post-state must satisfy the predicate corresponding to the end state. As an example, consider the Fig. 2 and suppose that we wish to prove the invariant <sup>x</sup> <sup>≤</sup> <sup>2</sup><sup>i</sup> under the pre-condition <sup>i</sup> = 0 <sup>∧</sup> <sup>x</sup> = 0. While this invariant holds, we cannot prove it because there is counter example if the computation begins at i%2 == 1. By applying the above improvement, we can prove that the computation must begin at i%2 == 0, and the invariant is verified.

#### **5 Evaluation**

The goals of our evaluation is the answer the following questions:


We implemented our loop summarization procedure and the compositional whole-program summarization technique described in Sect. 1.1. We ran on a suite of 165 benchmarks, drawn from the C4B [2] and HOLA [4] suites, as well as the safe, integer-only benchmarks in the loops category of SV-Comp 2019 [22]. We ran each benchmark with a time-out of 5 min, and recorded how many benchmarks were proved safe by our Q-VASR-based technique and our Q-VASRSbased technique. For context, we also compare with CRA [14] (a related loop summarization technique), as well as SeaHorn [7] and UltimateAutomizer [9] (state-of-the-art software model checkers). The results are shown in Fig. 3.

The number of assertions proved correct using Q-VASR is comparable to both SeaHorn and UltimateAutomizer, demonstrating that Q-VASR can indeed model interesting loop phenomena. Q-VASRS-based summarization significantly improves precision, proving the correctness of 93% of assertions in the svcomp suite, and more than any other tool in total. Note that the most precise tool for each suite is not strictly better than each of the other tools; in particular, there is only a single program in the HOLA suite that neither Q-VASRS nor CRA can prove safe.

CRA-based summarization is the most performant of all the compared techniques, followed by Q-VASR and Q-VASRS. SeaHorn and UltimateAutomizer employ abstraction-refinement loops, and so take significantly longer to run the test suite.


**Fig. 3.** Experimental results.

#### **6 Related Work**

*Compositional Analysis.* Our analysis follows the same high-level structure as compositional recurrence analysis (CRA) [5,14]. Our analysis differs from CRA in the way that it summarizes loops: we compute loop summaries by overapproximating loops with vector addition systems and computing reachability relations, whereas CRA computes loop summaries by extracting recurrence relations and computing closed forms. The advantage of our approach is that is that we can use Q-VASR to accurately model multi-path loops and can make theoretical guarantees about the precision of our analysis; the advantage of CRA is its ability to generate non-linear invariants.

*Vector Addition Systems.* Our invariant generation method draws upon Haase and Halfon's polytime procedure for computing the reachability relation of integer vector addition systems with states and resets [8]. Generalization from the integer case to the rational case is straightforward. Continuous Petri nets [3] are a related generalization of vector addition systems, where time is taken to be continuous (Q-VASR, in contrast, have rational state spaces but discrete time). Reachability for continuous Petri nets is computable polytime [6] and definable in ∃LRA [1].

Sinn et al. present a technique for resource bound analysis that is based on modeling programs by lossy vector addition system with states [21]. Sinn et al. model programs using vector addition systems with states over the natural numbers, which enables them to use termination bounds for VASS to compute upper bounds on resource usage. In contrast, we use VASS with resets over the rationals, which (in contrast to VASS over <sup>N</sup>) have a <sup>∃</sup>LIRA-definable reachability relation, enabling us to summarize loops. Moreover, Sinn et al.'s method for extracting VASS models of programs is heuristic, whereas our method gives precision guarantees.

*Affine and Polynomial Programs.* The problem of *polynomial* invariant generation has been investigated for various program models that generalize Q-VASR, including solvable polynomial loops [19], (extended) P-solvable loops [11,15], and affine programs [10]. Like ours, these techniques are *predictable* in the sense that they can make theoretical guarantees about invariant quality. The kinds invariants that can be produced using these techniques (conjunctions of polynomial equations) is incomparable with those generated by the method presented in this paper (∃LIRA formulas).

*Symbolic Abstraction.* The main contribution of this paper is a technique for synthesizing the best abstraction of a transition formula expressible in the language of Q-VASR (with or without states). This is closely related to the *symbolic abstraction* problem, which computes the best abstraction of a formula within an abstract domain. The problem of computing best abstractions has been undertaken for finite-height abstract domains [18], template constraint matrices (including intervals and octagons) [16], and polyhedra [5,24]. Our best abstraction result differs in that (1) it is for a disjunctive domain and (2) the notion of "best" is based on simulation rather than the typical order-theoretic framework.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Invertibility Conditions for Floating-Point Formulas**

Martin Brain3,4 , Aina Niemetz<sup>1</sup> , Mathias Preiner1(B) , Andrew Reynolds<sup>2</sup> , Clark Barrett<sup>1</sup> , and Cesare Tinelli<sup>2</sup>

> Stanford University, Stanford, USA preiner@cs.stanford.edu The University of Iowa, Iowa City, USA University of Oxford, Oxford, UK

> <sup>4</sup> City, University of London, London, UK

**Abstract.** Automated reasoning procedures are essential for a number of applications that involve bit-exact floating-point computations. This paper presents conditions that characterize when a variable in a floatingpoint constraint has a solution, which we call invertibility conditions. We describe a novel workflow that combines human interaction and a syntaxguided synthesis (SyGuS) solver that was used for discovering these conditions. We verify our conditions for several floating-point formats. One implication of this result is that a fragment of floating-point arithmetic admits compact quantifier elimination. We implement our invertibility conditions in a prototype extension of our solver CVC4, showing their usefulness for solving quantified constraints over floating-points.

#### **1 Introduction**

Satisfiability Modulo Theories (SMT) formulas including either the theory of floating-point numbers [12] or universal quantifiers [24,32] are widely regarded as some of the hardest to solve. Problems that combine universal quantification over floating-points are rare—experience to date has suggested they are hard for solvers and would-be users should either give up or develop their own incomplete techniques. However, progress in theory solvers for floating-point [11] and the use of expression synthesis for handling universal quantifiers [27,29] suggest that these problems may not be entirely out of reach after all, which could potentially impact a number of interesting applications.

This paper makes substantial progress towards a scalable approach for solving quantified floating-point constraints directly in an SMT solver. Developing procedures for quantified floating-points requires considerable effort, both foundationally and in practice. We focus primarily on establishing a foundation for lifting to quantified floating-point formulas a procedure for solving quantified bit-vector formulas by Niemetz et al. [26]. That procedure relies on so-called

This work was supported in part by DARPA (award no. FA8650-18-2-7861), ONR (award no. N68335-17-C-0558) and NSF (award no. 1656926).

*invertibility conditions*, intuitively, formulas that state under which conditions an argument of a given operator and predicate in an equation has a solution. Building on this concept and a state-of-the-art expression synthesis engine [29], we generate invertibility conditions for a majority of operators and predicates in the theory of floating-point numbers. In the context of quantifier-free floatingpoint formulas, floating-point invertibility conditions may enable us to lift the propagation-based local search approach for bit-vectors in [25] to the theory of floating-point numbers.

This work demonstrates that invertibility conditions exist and show promise for solving quantified floating-point constraints. More specifically, it makes the following contributions:


*Related Work.* To our knowledge, no previous work specifically discusses techniques for solving universally quantified floating-point formulas. Brain et al. [11] provide a comprehensive review of decision procedures for quantifier-free bitexact floating-point using both SMT-based as well as other approaches. They identify four groups of techniques: bit-blasting approaches that use floating-point circuits to generate bit-vector formulas [13,16,20,33], interval techniques that use partitioning and interval propagation [10,22,23,31], optimization and numerical approaches that work with complete valuations [4,7,18,21], and axiomatic techniques that use partial or total axiomatizations of the theory of floating-point numbers in other theories such as real arithmetic [14,15].

On the other hand, approaches for universal quantification have been developed in modern SMT solvers that target other background theories, including linear arithmetic [8,17,29] and bit-vectors [26,27,32]. At a high level, these approaches use model-based refinement loops that lazily add instances of universal quantifiers until they reach a conflict at the quantifier-free level, or otherwise saturate with a model.

#### **2 Preliminaries**

We assume the usual notions and terminology of many-sorted first-order logic with equality (denoted by <sup>≈</sup>). Let <sup>Σ</sup> be a *signature* consisting of a set <sup>Σ</sup><sup>s</sup> of sort symbols and a set Σ<sup>f</sup> of interpreted (and sorted) function symbols. Each function symbol f has a sort <sup>τ</sup><sup>1</sup> <sup>×</sup>...×τ<sup>n</sup> <sup>→</sup> <sup>τ</sup> , with arity <sup>n</sup> <sup>≥</sup> 0 and <sup>τ</sup>1, ..., τn, τ <sup>∈</sup> <sup>Σ</sup><sup>s</sup>. We assume that <sup>Σ</sup> includes a Boolean sort Bool and the Boolean constants (true) and <sup>⊥</sup> (false). We further assume the usual definition of well-sorted terms, literals, and (quantified) formulas with variables and symbols from Σ, and refer to them as Σ-terms, Σ-atoms, and so on. For a Σ-term or Σ-formula e, we denote the *free variables* of e (defined as usual) as F V(e) and use e[x] to denote that the variable x occurs free in e. We write e[t] for the term or formula obtained from e by replacing each occurrence of x in e by t.

A *theory* T is a pair (Σ,I), where Σ is a signature and I is a non-empty class of Σ-interpretations (the *models* of T) that is closed under variable reassignment, i.e., every <sup>Σ</sup>-interpretation that only differs from an I ∈ <sup>I</sup> in how it interprets variables is also in I. A Σ-formula ϕ is T*-satisfiable* (resp. T*-unsatisfiable*) if it is satisfied by some (resp. no) interpretation in I; it is T*-valid* if it is satisfied by all interpretations in I. We will sometimes omit T when the theory is understood from context.

We briefly recap the terminology and notation of Brain et al. [12] which defines an SMT-LIB theory T*FP* of floating-point numbers based on the IEEE-754 2008 standard [3]. The signature of T*FP* includes a parametric family of sorts Fε,σ where ε and σ are integers greater than or equal to 2 giving the number of bits used to store the exponent e and significand s, respectively. Each of these sorts contains five kinds of constants: normal numbers of the form <sup>1</sup>.s <sup>∗</sup> <sup>2</sup><sup>e</sup>, subnormal numbers of the form 0.s <sup>∗</sup> <sup>2</sup>−2σ*−*1−<sup>1</sup>, two zeros (+0 and <sup>−</sup>0), two infinities (+∞ and −∞) and a single not-a-number (NaN). We assume a map vε,σ for each sort, which maps these constants to their value in the set <sup>R</sup><sup>∗</sup> <sup>=</sup> <sup>R</sup> ∪ {+∞, −∞, NaN}. The theory also provides a rounding-mode sort RM, which contains five elements {RNE, RNA, RTP, RTN, RTZ}.

Table 1 lists all considered operators and predicate symbols of theory T*FP* . The theory contains a full set of arithmetic operations {|...|, <sup>+</sup>, <sup>−</sup>, ·, <sup>÷</sup>, √, max, min} as well as rem (remainder), rti (round to integral) and fma (combined multiply and add with just one rounding). The precise semantics of these operators is given in [12] and follows the same general pattern: vε,σ is used to project the arguments to R∗, the normal arithmetic is performed in R∗, then the rounding mode and the result are used to select one of the adjoints of vε,σ to convert the result back to Fε,σ. Note that the full theory in [12] includes several additional operators which we omit from discussion here, such as floating-point minimum/maximum, equality with floating-point semantics (fp.eq), and conversions between sorts.

Theory <sup>T</sup>*FP* further defines a set of ordering predicates {<, >, <sup>≤</sup>, ≥} and a set of classification predicates {isNorm, isSub, isInf, isZero, isNaN, isNeg, isPos}. In the following, we denote the rounding mode of an operation above the operator symbol, e.g., a RTZ + b adds a and b and rounds the result towards zero. We use the infix operator style for isInf (... ≈ ±∞), isZero (... ≈ ±0), and isNaN (... <sup>≈</sup> NaN) for conciseness. We further use minn/max<sup>n</sup> and mins/max<sup>s</sup> for floatingpoint constants representing the minimum/maximum normal and subnormal numbers, respectively. We will omit rounding mode and floating-point sorts if they are clear from the context.

#### **3 Invertibility Conditions for Floating-Point Formulas**

In this section, we adapt the concept of invertibility conditions introduced by Niemetz et al. in [26] to our theory T*FP* . Intuitively, an invertibility condition φ<sup>c</sup> for a literal l[x] is the exact condition under which l[x] has a solution for x, i.e., <sup>φ</sup><sup>c</sup> is equivalent to <sup>∃</sup>x. l[x] in <sup>T</sup>*FP* .

**Definition 1** *(Floating-Point Invertibility Condition).* Let l[x] be a ΣF P -literal. A quantifier-free ΣF P -formula φ<sup>c</sup> is an *invertibility condition* for x in l[x] if <sup>x</sup> ∈ F V(φc) and <sup>φ</sup><sup>c</sup> ⇔ ∃x. l[x] is <sup>T</sup>*FP -valid*.

As a simple example of an invertibility condition, given literal <sup>|</sup>x| ≈ <sup>t</sup> where <sup>|</sup>x<sup>|</sup> denotes the absolute value of <sup>x</sup>, a solution for <sup>x</sup> exists if and only if <sup>t</sup> is not negative, i.e., if <sup>¬</sup>isNeg(t) holds. We introduce additional terminology for the sake of the discussion. We define the *dimension* of an invertibility condition problem <sup>∃</sup>x. l[x] as the number of free variables it contains. For example, if <sup>s</sup> and <sup>t</sup> are variables, then the dimension of <sup>∃</sup>x. x <sup>+</sup> <sup>s</sup> <sup>≈</sup> <sup>t</sup> is two, the dimension of <sup>∃</sup>x. isZero(x+s) is one, and the dimension of <sup>∃</sup>x. isZero(|x|) is zero. A literal <sup>l</sup>[x] is *fully invertible* if its invertibility condition is . A term <sup>e</sup> is an (unconditional) *inverse* for <sup>x</sup> in <sup>l</sup>[x] if <sup>l</sup>[e] is equivalent to . For example, the literal <sup>−</sup><sup>x</sup> <sup>≈</sup> <sup>t</sup> is fully invertible and <sup>−</sup><sup>t</sup> is an inverse for <sup>x</sup> in this literal. We say that <sup>e</sup> is a *conditional inverse* for l[x] if l[e] is an invertibility condition for l[x].

Our primary goal in this work is to establish invertibility conditions for all floating-point constraints that contain exactly one operator and one predicate. These conditions collectively suffice to characterize when any literal l[x] containing exactly one occurrence of x, the variable to solve for, has a solution. In total, we were able to establish 167 out of 188 invertibility conditions (counting commutative cases only once) using a syntax-guided synthesis framework which we describe in more detail in Sect. 4. In this section, we present a subset of these invertibility conditions, highlighting the most interesting cases where


**Table 1.** Considered floating-point predicates/operators, with SMT-LIB 2 syntax.

we succeeded (or failed) to establish an invertibility condition. Due to space restrictions, we omit the conditions for the remaining cases.<sup>1</sup>


**Table 2.** Invertibility conditions for floating-point operators (excl. fma) with ≈.

Table 2 lists the invertibility conditions for equality with the operators {+, <sup>−</sup>, ·, <sup>÷</sup>,rem, <sup>√</sup>, <sup>|</sup>...|, <sup>−</sup>,rti}, parameterized over a rounding mode <sup>R</sup> (one of RNE, RNA, RTP, RTN, or RTZ). Note that operators {+, ·} and the multiplicative step of fma are commutative, and thus the invertibility conditions for both variants are identical.

Each of the first six invertibility conditions in this table follows a pattern. The first two disjuncts are instances of the literal to solve for, where a term involving rounding modes RTP and RTN is substituted for x. These disjuncts are then followed by disjuncts for handling special cases for infinity and zero. From the structure of these conditions, e.g., for +, we can derive the insight that if there is a solution for x in the equation x <sup>R</sup> <sup>+</sup> <sup>s</sup>≈<sup>t</sup> and we are not in a corner case where s = t, then either t *RTP* <sup>−</sup><sup>s</sup> or <sup>t</sup> *RTN* <sup>−</sup><sup>s</sup> must be a solution. Based on extensive runs of our syntax-guided synthesis procedure, we believe this condition is close to having minimal term size. From this, we conclude that an efficient yet complete method for solving x <sup>R</sup> <sup>+</sup> <sup>s</sup>≈<sup>t</sup> checks whether <sup>t</sup> <sup>−</sup> <sup>s</sup> rounding towards positive or negative is a solution in the non-trivial case when s and t are disequal, and otherwise concludes that no solution exists. A similar insight can be derived for the other invertibility conditions of this form.

<sup>1</sup> Available at https://cvc4.cs.stanford.edu/papers/CAV2019-FP.

We found that <sup>t</sup> is a conditional inverse for the case of <sup>R</sup> rti(x)≈<sup>t</sup> and <sup>x</sup> rem <sup>s</sup>≈t, that is, substituting <sup>t</sup> for <sup>x</sup> is an invertibility condition. For the latter, we discovered an alternative invertibility condition:

$$|t^{\stackrel{\mathsf{RTP}}{t}}| \le |s| \lor |t^{\stackrel{\mathsf{RTP}}{t}}| \le |s| \lor \text{ite}(t \approx \pm 0, s \not\approx 0, t \not\approx \pm \infty) \tag{1}$$

In contrast to the condition from Table 2, this version does not involve rem. It follows that certain applications of floating-point remainder, including those whose first argument is an unconstrained variable, can be eliminated based on this equivalence. Interestingly, for <sup>s</sup> rem <sup>x</sup>≈t, we did not succeed in finding an invertibility condition. This case appears to not admit a concise solution; we discuss further details below.

Table 3 gives the invertibility conditions for ≥. Since these constraints admit more solutions, they typically have simpler invertibility conditions. In particular, with the exception of rem, all conditions only involve floating-point classifiers.

When considering literals with predicates, the invertibility conditions for cases involving <sup>x</sup> <sup>+</sup> <sup>s</sup> and <sup>s</sup> <sup>−</sup> <sup>x</sup> are identical for every predicate and rounding mode. This is due to the fact that <sup>s</sup> <sup>−</sup> <sup>x</sup> is equivalent to <sup>s</sup> + (−x), independent from the rounding mode. Thus, the negation of the inverse value of x for an equation involving x + s is the inverse value of x for an equation involving <sup>s</sup> <sup>−</sup> <sup>x</sup>. Similarly, the invertibility conditions for <sup>x</sup> · <sup>s</sup> and <sup>s</sup> <sup>÷</sup> <sup>x</sup> over predicates {<, <sup>≤</sup>, >, <sup>≥</sup>, isInf, isNaN, isNeg, isZero} are identical for all rounding modes.

For all predicates except {≈, isNorm, isSub}, the invertibility conditions for operators {+, <sup>−</sup>, <sup>÷</sup>, ·} contain floating-point classifiers only. All of these conditions are also independent from the rounding mode. Similarly, for operator fma over predicates {isInf, isNaN, isNeg, isPos}, the invertibility conditions contain


**Table 3.** Invertibility conditions for floating-point operators (excl. fma) with ≥.

only floating-point classifiers. All of these conditions except for isNeg(fma(x, s, t)) and isPos(fma(x, s, t)) are also independent from the rounding mode.

For all floating-point operators with predicate isNaN, the invertibility condition is , i.e., an inverse value for <sup>x</sup> always exists. This is due to the fact that every floating-point operator returns NaN if one of its operands is NaN, hence NaN can be picked as an inverse value of x. Conversely, we identified four cases for which the invertibility condition is <sup>⊥</sup>, i.e., an inverse value for <sup>x</sup> never exists. These four cases are isNeg(|x|), isInf(<sup>x</sup> rem <sup>s</sup>), isInf(<sup>s</sup> rem <sup>x</sup>), and isSub(rti(x)). For the first three cases, it is obvious why no inverse value exists. The intuition for isSub(rti(x)) is that integers are not subnormal, and as a result if x is rounded to an integer it can never be a subnormal number. All of these cases can be easily implemented as rewrite rules in an SMT solver.

For operator fma, the invertibility conditions over predicates {isInf, isNaN, isNeg, isPos} contain floating-point classifiers only. For predicate isZero, the invertibility conditions are more involved. Equations (2) and (3) show the invertibility conditions for isZero(fma(x, s, t)) and isZero(fma(s, t, x)) for all rounding modes R.

$$\mathsf{frma}(-(t\xleftarrow{\mathsf{RTP}}s),s,t) \approx \pm 0 \lor \mathsf{frma}(-(t\xleftarrow{\mathsf{RTP}}s),s,t) \approx \pm 0 \lor (s \approx \pm 0 \land t \approx \pm 0) \tag{2}$$

$$\mathsf{fm}^{\boldsymbol{R}}(s, t, -(s^{\mathsf{RTP}} \cdot t)) \approx \pm 0 \vee \mathsf{fm}^{\boldsymbol{R}}(s, t, -(s^{\mathsf{RTP}} \cdot t)) \approx \pm 0\tag{3}$$

These two invertibility conditions contain case splits similar to those in Table 2 and indicate that, e.g., <sup>−</sup><sup>t</sup> RTP <sup>÷</sup> <sup>s</sup> is an inverse value for <sup>x</sup> when <sup>R</sup> fma(−(<sup>t</sup> RTP <sup>÷</sup> <sup>s</sup>), s, t)≈±<sup>0</sup> holds.

As we will describe in Sect. 4, an important aspect of synthesizing these invertibility conditions was considering their visualizations. This helped us determine which invertibility conditions were relatively simple and which exhibited complex behavior.

**Fig. 1.** Invertibility conditions for {+*,* ·*,* ÷} over <sup>≈</sup> for <sup>F</sup>3,<sup>5</sup> and rounding mode RNE.

**Fig. 2.** Invertibility conditions for rem over <sup>≈</sup> for <sup>F</sup>3,5.

Figure 1 shows the visualizations of the invertibility conditions for operators {+, ·, ÷} over <sup>≈</sup> from Table <sup>2</sup> for sort <sup>F</sup>3,<sup>5</sup> with rounding mode RNE (each of the literals is two-dimensional). We use 227×227 pixel maps over all possible values of s and t, where the pixel at point (s, t) is white if the invertibility condition is true, and black if it is false.<sup>2</sup> The values of s are plotted on the horizontal axis and the values of t are plotted on the vertical axis. The leftmost two columns (resp. topmost two rows) give the value of the invertibility condition for <sup>s</sup> <sup>=</sup> <sup>±</sup><sup>0</sup> (resp. <sup>t</sup> <sup>=</sup> <sup>±</sup>0); the rightmost column (resp. bottom row) gives its value for NaN; the next two columns left of (resp. next two rows on top of) NaN give its value for ±∞; the remainder plots the values of the subnormal and normal values of s and t, left-to-right (resp. top-to-bottom) in increasing order of their absolute value, alternating between positive and negative values. These visualizations give an intuition of the complexity of the behavior of invertibility conditions, which is a consequence of the complex semantics of floating-point operations.

Figure 2 gives the invertibility condition visualizations for remainder over <sup>≈</sup> with sort <sup>F</sup>3,<sup>5</sup> and rounding mode RNE. The visualization on the left hand shows that solving for x as the first argument is relatively easy. It suggests that an invertibility condition for this case involves a linear inequality relating the absolute values of s and t, which we were able to derive in Eq. (1). Solving for x as the second argument, on the other hand, is much more difficult, as indicated by the right picture, which has a significantly more complex structure. We conjecture that no simple solution exists for the latter problem. The visualization of the invertibility condition gives some of the intuition for this: the diagonal divide is caused by the fact that output t will always have a smaller absolute value than the input s. The top-left corner represents subnormal/subnormal computation, this acts as fixed-point and behaves differently from the rest of the function. The stepped blocks along the diagonal occur when s and t have the same exponent and thus the pattern is similar to the invertibility condition for + shown in Fig. 1. Portions right of the main diagonal appear to exhibit random behavior.

<sup>2</sup> Notice that we consider all possible (2<sup>σ</sup>*−*<sup>1</sup>−1)∗<sup>2</sup> NaN values of *<sup>T</sup>FP* as one single NaN value. Thus, for sort F3,<sup>5</sup> we have 227 floating-point values (instead of 2<sup>8</sup> = 256).

**Fig. 3.** Invertibility conditions for rem over inequalities for F3,5.

**Fig. 4.** Invertibility conditions for fma over {isZero*,* isSub} for <sup>F</sup>3,<sup>5</sup> and rnd. mode RNE.

We believe this is the result of repeated cancellations in the computation of the remainder for those values, which suggests a behavior that we believe is similar to the Blum-Blum-Shub random number generator [9].

For remainder with inequalities, we succeeded in determining invertibility conditions for <sup>≤</sup> and <sup>≥</sup> if <sup>x</sup> is the first argument. However, for <sup>x</sup> rem <sup>s</sup> over {<, >}, and <sup>s</sup> rem <sup>x</sup> over {≥, <sup>≤</sup>, <, >} we did not. This is particularly surprising considering that the invertibility conditions for non-strict and strict inequalities are nearly identical (varying only by a handful of pixels), as shown in Fig. 3. Note that for x as the first argument, all variations of the concise invertibility conditions for non-strict inequality we considered failed as solutions for the strict inequality. This behavior is representative of the many subtle corner cases we encountered while synthesizing these conditions.

Figure 4 shows visualizations for invertibility conditions involving fma. The left two images are visualizations for the invertibility conditions for isZero. The corresponding invertibility conditions are given in Eqs. (2) and (3) above. We were not able to determine invertibility conditions for operator fma over predicate isSub, which are visualized in the rightmost two pictures in Fig. 4. Finally, we did not succeed in finding invertibility conditions for fma with binary predicates, which are particularly challenging since they are three-dimensional. Finding solutions for these cases is ongoing work (see Sect. 4 for a more in-depth discussion).

#### **4 Synthesis of Floating-Point Invertibility Conditions**

Deriving invertibility conditions in T*FP* is a highly challenging task. We were unable to derive these conditions manually despite our substantial background knowledge of floating-point numbers. As a consequence, we developed a custom extension of the syntax-guided synthesis (SyGuS) paradigm [1] with the goal of finding invertibility conditions automatically, which resulted in the conditions from Sect. 3. While the extension was optimized for this task, we stress that our techniques are theory-agnostic and can be used for synthesis problems over any finite domain. Our approach builds upon the SyGuS capabilities of the SMT solver CVC4 [5,29], which has recently been extended to support reasoning about the theory of floating-points [11]. We use the invertibility condition for floatingpoint addition with equality here as a running example.

Establishing an invertibility condition requires solving a synthesis problem with *three* levels of quantifier alternation. In particular, for floating-point addition with equality, we are interested in finding a solution for predicate IC that satisfies the conjecture:

$$(\exists \mathsf{IC}. \forall s, t. \left( \mathsf{IC}(s, t) \Leftrightarrow (\exists x. \, x \overset{\mathsf{R}}{+} s \approx t) \right) \tag{4}$$

for some rounding mode R. In other words, this conjecture states that IC(s, t) holds exactly when there exists an x that, when rounding the result of adding x to s according to mode R, yields t. Furthermore, we are interested in finding a solution for IC that holds *independently of the format* of x, s, t. Note that SMT solvers are not capable of reasoning about constraints that are parametric in the floating-point format. To address this challenge, following the methodology from previous work [26], our strategy for establishing (general) invertibility conditions first solves the synthesis conjecture for a fixed format Fε,σ, and subsequently checks whether that solution also holds for other formats. The choice of the number of exponent bits ε and significand bits σ in Fε,σ balances two criteria:


In our experience, the best choices for (ε, σ) depended on the particular invertibility condition we were solving. The most common choices for (ε, σ) were (3, 5), (4, 5) and (4, 6). For most two-dimensional invertibility conditions (those that involve two variables s and t), we used (3, 5), since the required synthesis procedures mentioned below were roughly eight times faster than for (4, 5). For one-dimensional invertibility conditions, we often used higher precision formats. Since floating-point operators like addition take as additional argument a rounding mode R, we assumed a fixed rounding mode when solving, and then crosschecked our solution for multiple rounding modes.

Assume we have chosen to synthesize the invertibility condition for conjecture (4) for format F<sup>3</sup>,<sup>5</sup> and rounding mode RNE. Notice that current SyGuS solvers [2,29] support only two levels of quantifier alternation. However, we can expand the innermost quantifier in this conjecture to obtain the conjecture:

$$\exists!\mathbb{C}.\,\forall st.\,(\mathbb{IC}(s,t)\Leftrightarrow(\bigvee\_{i=0}^{226}i+s\approx t))\tag{5}$$

where for simplicity of notation we use i = 0,..., 226 to denote the values of F3,<sup>5</sup>. This methodology was also used in Niemetz et al. [26], where invertibility conditions for bit-vector operators were synthesized for bit-width 4 by giving the conjecture of the above form to an off-the-shelf SyGuS solver. In contrast to that work, we found that the synthesis conjecture above is too challenging to be solved efficiently by current state-of-the-art enumerative SyGuS solvers. The reason for this is twofold. First, the smallest viable floating-point format is 3 + 5 = 8 bits, which requires the body of (5) to have a significantly large number of disjuncts (227), which is more than ten times larger than the 16 disjuncts required when synthesizing 4-bit invertibility conditions for bit-vectors. Second, floating-point formulas are much harder to solve than bit-vector formulas, due to the complexity of their bit-blasted encodings. Thus, a significantly challenging satisfiability query must be solved *for each* candidate considered within the SyGuS solver.

To address the above challenges, we perform a more extreme preprocessing step on our synthesis conjecture, which computes the input/output behavior of the invertibility condition on all points in the domain of s and t. In other words, we rephrase our synthesis conjecture as:

$$\exists!\mathbb{C}.\bigwedge\_{i=0}^{226}\bigwedge\_{j=0}^{226}(\mathbb{IC}(i,j)\Leftrightarrow c\_{i,j})\tag{6}$$

where each <sup>c</sup>i,j is a Boolean constant (either or <sup>⊥</sup>) determined by a quantifierfree satisfiability query. In particular, for each pair of floating-point values (i, j), constant <sup>c</sup>i,j is if <sup>x</sup>+<sup>i</sup> <sup>≈</sup> <sup>j</sup> is satisfiable, and <sup>⊥</sup> if it is unsatisfiable. In practice, we represent the above conjecture as a 227 × 227 table, which we call the *full I/O specification* of invertibility condition IC. In our experiments, computing this table for most two-dimensional invertibility conditions of sort F<sup>3</sup>,<sup>5</sup> required 15 min (for 227 <sup>∗</sup> 227 = 51, 529 quantifier-free queries), and 2 h for sort <sup>F</sup><sup>4</sup>,<sup>5</sup> (requiring 483 <sup>∗</sup> 483 = 233, 289 queries). This process was accelerated by first applying random sampling over possible values of x to quickly test if a query was satisfiable. For some operators, notably remainder, this required significantly more time than for others (up to a factor of 2). Due to the high cost of this preprocessing step, we generated a database with the full I/O specifications for *all* invertibility conditions from Sect. 3 using a cluster of 50 nodes with Intel Xeon E5-2637 with 3.5 GHz and 32 GB memory, and then shared this database among multiple developers. Computing the full I/O specifications for F<sup>3</sup>,<sup>5</sup>, F<sup>4</sup>,<sup>5</sup>, and F<sup>4</sup>,<sup>6</sup> required a total of 459 days of CPU time (6.1 for F<sup>3</sup>,<sup>5</sup>, 54.7 for F<sup>4</sup>,<sup>5</sup>, and 398.5 for F<sup>4</sup>,<sup>6</sup>). Despite the heavy cost of this step, it was crucial for accelerating our framework for synthesizing invertibility conditions, described next.

**Fig. 5.** Architecture for synthesizing invertibility conditions for floating point formulas.

Figure 5 summarizes our architecture for solving synthesis conjectures of the above form. The user first selects an invertibility condition problem to solve, where we assume the full I/O specification has been computed using the aforementioned techniques. At a high level, our architecture can be seen as an *interactive synthesis environment*, where the user manages the interaction between two subprocedures:


We use a counterexample-guided loop, where the SyGuS solver provides the solution verifier with candidate solutions, and the solution verifier provides the SyGuS solver with an evolving subset of sample points taken from the full I/O specification. These points correspond to counterexamples to failed candidate solutions, and are sampled in a uniformly random manner over the domain of our specification. To accelerate the speed at which our framework converges on a solution, we configure the solution verifier to generate multiple counterexample points (typically 10) for each iteration of the loop. The process terminates when the SyGuS solver generates a candidate solution that is correct for all points according to its full I/O specification.

We give the user control over both the solutions and counterexample points generated in this loop. First, as is commonly done in syntax-guided synthesis applications, the user in our workflow provides an input grammar to the SyGuS solver. This is a context-free grammar in a standard format [28], which contains a guess of the operators and patterns that may be involved in the invertibility condition we are synthesizing. Second, note that the domain of floating-point numbers can be subdivided into a number of subdomains and special cases (e.g. normal, subnormal, not-a-number, infinity), as well as split into different classifications (e.g. positive and negative). Our workflow allows the user to provide a *side condition*, whose purpose is to focus on finding an invertibility condition that is correct for one of these subdomains. The side condition acts as a filtering mechanism on the counterexample points generated by the solution verifier. For example, given the side condition isNorm(s)∧isNorm(t), the solution verifier checks candidate solutions generated by the SyGuS solver only against points (s, t) where both arguments are normal, and consequently only communicates counterexamples of this form to the SyGuS solver. The solution verifier may also be configured to establish that the current candidate solution generated by the SyGuS solver is *conditionally* correct, that is, it is true on all points in the domain that satisfy the side condition.

There are several advantages to the form of the synthesis conjecture in (6) that we exploit in our workflow. First, its structure makes it easy to divide the problem into sub-cases: our synthesis workflow at all times sends only a subset of the conjuncts of (6) for some (i, j) pairs. As a result, we do not burden the underlying SyGuS solver with the entire conjecture at once, which would not scale in practice. A second advantage is that it is in *programming-by-examples* (PBE) form, since it consists of a conjunction of concrete input-output pairs. As a consequence, specialized algorithms can be used by the SyGuS solver to generate solutions for (approximations of) our conjecture in a way that is highly scalable in practice. These techniques are broadly referred to as decision tree learning or unification algorithms. As a brief review (see Alur et al. [2] for a recent SyGuS-based approach), a decision tree learning algorithm is given as input a set of good examples <sup>c</sup><sup>1</sup> → ,...,c<sup>n</sup> → and a set of bad examples <sup>d</sup><sup>1</sup> → ⊥,...,d<sup>m</sup> → ⊥. The goal of a decision tree algorithm is to find a predicate, or *classifier*, that evaluates to true on all the good examples, and false on all the bad examples. In our context, a classifier is expressed as an if-then-else tree of Boolean sort. Sampling the space of conjecture (6) provides the decision tree algorithm with good and bad examples and the returned classifier is a candidate solution that we give to the solution verifier. The SyGuS solver of CVC4 uses a decision-tree learning algorithm, which we rely on in our workflow. Due to the scalability of this algorithm and the fact that only a small subset of our conjecture is considered at any given time, candidate solutions are typically generated by the SyGuS solver in our framework in a matter of seconds.

Another important aspect of the SyGuS solver in Fig. 5 is that it is configured to generate *multiple* solutions for the current set of sample points. Due to the way the SyGuS-based decision-tree learning algorithm works, these solutions tend to become *more general* over the runtime of the solver. As a simple example (assuming exact integer arithmetic), say the solver is given input points (1, 1) → , (2, 0) → , (1, 0) → ⊥ and (0, 1) → ⊥ for (s, t). It enumerates predicates over <sup>s</sup> and <sup>t</sup>, starting with simplest predicates first, say <sup>s</sup> <sup>≈</sup> 0, <sup>t</sup> <sup>≈</sup> 0, <sup>s</sup> <sup>≈</sup> 1, <sup>y</sup> <sup>≈</sup> 1, s + t > 1, and so on. After generating the first four predicates, it constructs the solution ite(<sup>s</sup> <sup>≈</sup> <sup>1</sup>, t <sup>≈</sup> <sup>1</sup>, t <sup>≈</sup> 0), which is a correct classifier for the given set of points. However, after generating the fifth predicate in this list, it returns s + t > 1 itself as a solution; this can be seen as a generalization of the previous solution since it requires no case splitting.

Since more general candidate solutions have a higher likelihood of being actual solutions in our experience, our workflow critically relies on the ability of users to manually terminate the synthesis procedure when they are satisfied with the last generated candidate. Our synthesis procedure logs a list of candidate solutions that satisfy the conjecture on the current set of sample points. When the user terminates the synthesis process, the solution verifier will check the last solution generated in this list. Users have the option to rearrange the elements of this list by hand, if they have an intuition that a specific candidate is more likely to be correct—and so should be tested first.

*Experience.* The first challenging invertibility condition we solved with our framework was addition with equality for rounding mode RNE. Initially, we used a generic grammar that contained the entire floating-point signature. As a first key step towards solving this problem, the synthesis procedure suggested the single literal <sup>t</sup>≈<sup>s</sup> RNE + (t RNE <sup>−</sup> <sup>s</sup>) as candidate solution. Although counterexamples were found for this candidate, we noticed that it satisfied over 98% of the specification, and a visualization of its I/O behavior showed similar patterns to the invertibility condition we were solving for. Based on these observations, we focused our grammar towards literals of this form. In particular, we used a function that takes two floating-points x, y and two rounding modes R1, R<sup>2</sup> as arguments and returns x *R1* +(y *R2* <sup>−</sup>x) as a builtin symbol of our grammar. We refer to such a function as a *residual* computation of y, noting that its value is often approximately y. By including various functions for residual computations, we focused the effort of the synthesizer on more interesting predicates. The end solution involved multiple residual computations, as shown in Table 2. Our initial solution was specific to the rounding mode RNE. After solving for several other rounding modes, we were able to construct a parametric solution that was correct for all rounding modes. In total, it took roughly three days of developer time to discover the generalized invertibility condition for addition with equality. Many of the subsequent invertibility conditions took a matter of hours, since by then we had a good intuition for the residual computations that were relevant for each case.

Invertibility conditions involving rem, fma, isNorm, and isSub were challenging and required further customizations to the grammar, for instance to include constants that corresponded to the minimum and maximum normal and subnormal values. Three-dimensional invertibility conditions (which in this work is limited to cases of fma with binary predicates) were especially challenging since the domain of their conjecture is a factor of 227 larger for F<sup>3</sup>,<sup>5</sup> than the others. Following our strategy for solving the invertibility conditions for specific formats and rounding modes, in ongoing work we are investigating solving these cases by first solving the invertibility condition for a fixed value c for one of its free variables u. Solving a two-dimensional problem of this form with a solution ϕ may suggest a generalization that works for all values of u where all occurrences of c in ϕ are replaced by u.

We found the side condition feature of our workflow important for narrowing down which subdomain was the most challenging for the conjecture in question. For instance, for some cases it was very easy to find invertibility conditions that held when both s and t were normal (resp., subnormal), but very difficult when s was normal and t was subnormal or vice versa.

We also implemented a fully automated mode for the synthesis loop in Fig. 5. However, in practice, it was more effective to tweak the generated solutions manually. The amount of user interaction was not prohibitively high in our experience.

Finally, we found that it was often helpful to visualize the input/output behavior of candidate solutions. In many cases, the difference between a candidate solution and the desired behavior of the invertibility condition would reveal a required modification to the grammar or would suggest which parts of the domain of the conjecture to focus on.

#### **4.1 Verifying Conditions for Multiple Formats and Rounding Modes**

We verified the correctness of all 167 invertibility conditions by checking them against their corresponding full I/O specification for floating-point formats F3,<sup>5</sup>, F4,<sup>5</sup>, and F4,<sup>6</sup> and all rounding modes, which required 1.6 days of CPU time. This is relatively cheap compared to computing the specifications, since checking is essentially constant evaluation of invertibility conditions for all possible input values. However, this quickly becomes infeasible with increasing precision, since the time required for computing the I/O specification roughly increases by a factor of 8 for each bit.

As a consequence, we generated quantified floating-point problems to verify the 167 invertibility conditions for formats F3,<sup>5</sup>, F4,<sup>5</sup>, F4,<sup>6</sup>, F5,<sup>11</sup> (Float16), F8,<sup>24</sup> (Float32), and F11,<sup>53</sup> (Float64) and all rounding modes. Each problem checks the <sup>T</sup>*FP* -unsatisfiability of formula <sup>¬</sup>(φ<sup>c</sup> ⇔ ∃x. l[x]), where <sup>l</sup>[x] corresponds to the floating-point literal, and φ<sup>c</sup> to its invertibility condition. In total, we generated

**Fig. 6.** Recursive procedure QEFP for computing quantifier elimination for *x* in the unit linear formula ∃*x. P*(*t*1*,...,t*<sup>j</sup> [*x*]*,...,t*n). The free variables in this formula and the fresh variable *y* are implicitly universally quantified. Placeholder  denotes a floatingpoint operator from Table 1.

3786 problems (116 <sup>∗</sup> 5 + 51<sup>3</sup> for each floating-point format) and checked them using CVC4 [5] (master 546bf686) and Z3 [16] (version 4.8.4).

We consider an invertibility condition to be verified for a floating-point format and rounding mode if at least one solver reports unsatisfiable. Given a CPU time limit of one hour and a memory limit of 8 GB for each solver/benchmark pair, we were able to verify 3577 (94.5%) invertibility conditions overall, with 99.2% of F<sup>3</sup>,<sup>5</sup>, 99.7% of F<sup>4</sup>,<sup>5</sup>, 100% of F<sup>4</sup>,<sup>6</sup>, 93.8% of F<sup>5</sup>,<sup>11</sup>, 90.2% of F<sup>8</sup>,<sup>24</sup>, and 84% of F<sup>11</sup>,<sup>53</sup>. This verification with CVC4 and Z3 required a total of 32 days of CPU time. All verification jobs were run on cluster nodes with Intel Xeon E5-2637 3.5 GHz and 32 GB memory.

#### **5 Quantifier Elimination for Unit Linear Floating-Point Formulas**

Based on the invertibility conditions presented in Sect. 3, we can define a quantifier elimination procedure for a restricted fragment of floating-point formulas. The procedure applies to *unit linear* formulas, that is, formulas of the form <sup>∃</sup>x. P[x] where <sup>P</sup> is a <sup>Σ</sup>F P -literal containing exactly one occurrence of <sup>x</sup>.

Figure 6 gives a quantifier elimination procedure QEFP for unit linear floatingpoint formulas <sup>∃</sup>x. P[x]. We write getIC(y,Q[y]) to indicate the invertibility condition for y in Q[y], which amounts to a table lookup for the appropriate condition as given in Sect. 3. Note that our procedure is currently a partial function because we do not have yet invertibility conditions for some unit linear formulas. The recursive procedure returns a conjunction of conditions based on the path on which x occurs in P. If x occurs beneath multiple nested function applications, a fresh variable y is introduced and used for referencing the intermediate result of the subterm we are currently solving for. We demonstrate this in the following example.

*Example 2.* Consider the unit linear formula <sup>∃</sup>x.(<sup>x</sup> <sup>R</sup> · <sup>u</sup>) <sup>R</sup> <sup>+</sup> <sup>s</sup> <sup>≥</sup> <sup>t</sup>. Invoking the procedure QEFP on this input yields, after two recursive calls, the conjunction

$$\mathsf{getlC}(y\_1, y\_1 \overset{\kappa}{+} s \ge t) \land \mathsf{getlC}(y\_2, y\_2 \overset{\kappa}{\cdot} u \approx y\_1) \land \mathsf{getlC}(x, x \approx y\_2)$$

where y<sup>1</sup> and y<sup>2</sup> are fresh variables. The third conjunct is trivially equivalent to . This formula is quantifier-free and has the properties specified by the following theorem.

**Theorem 1.** *Let* <sup>∃</sup>x. P *be a unit linear formula and let* <sup>I</sup> *be a model of* <sup>T</sup>*FP . Then,* <sup>I</sup> *satifies* ¬∃x. P *if and only if there exists a model* <sup>J</sup> *of* <sup>T</sup>*FP (constructible from* <sup>I</sup>*) that satisfies* <sup>¬</sup>QEFP(∃x. P)*.*

<sup>3</sup> 116 invertibility conditions from rounding mode dependent operators and 51 invertibility conditions where the operator is rounding mode independent (e.g., rem).

Niemetz et al. [26] present a similar algorithm for solving unit linear bit-vector literals. In that work, a counterexample-guided loop was devised that made use of Hilbert-choice expressions for representing quantifier instantiations. In contrast to that work, we provide here only a quantifier elimination procedure. Extending our techniques to a general quantifier instantiation strategy is the subject of ongoing work. We discuss our preliminary work in this direction in the next section.

#### **6 Solving Quantified Floating-Point Formulas**

We implemented a prototype extension of the SMT solver CVC4 that leverages the results of the previous section to determine the satisfiability of quantified floating-point formulas. To handle quantified formulas, CVC4 uses a basic model-based instantiation loop (see, e.g., [30,32] for instantiation approaches for other theories). This technique maintains a quantifier-free set of constraints F corresponding to instantiations of universally quantified formulas. It terminates with the response "unsatisfiable" if F is unsatisfiable, and terminates with "satisfiable" if it can show that the given quantified formulas are satisfied by a model of T*FP* that satisfies F. For T*FP* , the instantiations are substitutions of universally quantified variables to concrete floating-point values, e.g. <sup>∀</sup>x. P(x) <sup>⇒</sup> <sup>P</sup>(0), which can be highly inefficient in the worst case for higher precision.

We extend this basic loop with a preprocessing pass that generates theory lemmas based on the invertibility conditions corresponding to literals of quantified formulas <sup>∀</sup>x.P with exactly one occurrence of <sup>x</sup>, as explained in the example below.

*Example 3.* Suppose the current set S of formulas contains a formula ϕ of the form <sup>∀</sup>x.¬((<sup>x</sup> · <sup>u</sup>) + <sup>s</sup> <sup>≥</sup> <sup>t</sup> <sup>∧</sup> <sup>Q</sup>(x)) where <sup>u</sup>, <sup>s</sup> and <sup>t</sup> are ground terms; then we add the following formula to S where y<sup>1</sup> and y<sup>2</sup> are fresh (free) variables:

$$(\mathfrak{gelt}\mathbb{C}(y\_1, y\_1 + s \ge t) \Rightarrow y\_1 + s \ge t) \land (\mathfrak{gelt}\mathbb{C}(y\_2, y\_2 \cdot u \approx y\_1) \Rightarrow y\_2 \cdot u \approx y\_1)$$

The addition of this lemma is satisfiability preserving because, if the invertibility condition holds for <sup>y</sup><sup>1</sup> <sup>+</sup> <sup>s</sup> <sup>≥</sup> <sup>t</sup> (resp., <sup>y</sup><sup>2</sup> · <sup>u</sup> <sup>≈</sup> <sup>y</sup>1), then <sup>y</sup><sup>1</sup> (resp., <sup>y</sup>2) a solution for that literal. We then add the instantiation lemma <sup>ϕ</sup> ⇒ ¬((y<sup>2</sup> · <sup>u</sup>) + <sup>s</sup> <sup>≥</sup> <sup>t</sup> <sup>∧</sup> Q(y2)). Although x is not necessarily linear in the body of ϕ, if both invertibility conditions hold, then the combination of the above lemmas implies (y2·u)+<sup>s</sup> <sup>≥</sup> <sup>t</sup>, which together with the instantiation lemma allows the solver to infer that the remaining portion of the quantified formula Q cannot hold for y2. An inference of this form may be more productive than enumerating the possible values of x in instantiations.

*Evaluation.* We considered all 61 benchmarks from SMT-LIB [6] that contained quantified formulas over floating-points (logic FP), which correspond to verification conditions from the software verification competition that use a floatingpoint encoding [19]. The invertibility conditions required for solving their literals include floating-point addition, multiplication and division (both arguments) with equality and inequality. We implemented all cases of invertibility conditions for solving these cases. We extended our SMT solver CVC4 (GitHub master 5d248c36) with the above preprocessing pass (GitHub cav19fp 9b5acd74), and compared its performance with (configuration CVC4-ext) and without (configuration CVC4-base) the above preprocessing pass enabled to the SMT solver Z3 (version 4.8.4). All experiments were run on the same cluster mentioned earlier, with a memory limit of 8 GB and a 1800 s time limit. Overall, CVC4-base solved 35 benchmarks within the time limit (with no benchmarks uniquely solved compared to CVC4-ext), CVC4-ext solved 42 benchmarks (7 of these uniquely solved compared to the base version), and Z3 solved 56 benchmarks. While CVC4-ext solves significantly fewer benchmarks than Z3, we believe that the improvement over CVC4-base is indicative that our approach for invertibility conditions shows potential for solving quantified floating-point constraints in SMT solvers. A more comprehensive evaluation and implementation is left as future work.

#### **7 Conclusion**

We have presented invertibility conditions for a large subset of combinations of floating-point operators over floating-point predicates supported by SMT solvers. These conditions were found by a framework that utilizes syntax-guided synthesis solving, customized for our problem and developed over the course of this work. We have shown that invertibility conditions imply that a simple fragment of quantified floating-points admits compact quantifier elimination, and have given preliminary evidence that an SMT solver that partially leverages this technique can have a higher success rate on floating-point problems coming from a software verification application.

For future work, we plan to extend techniques for quantified and quantifierfree floating-point formulas to incorporate our findings, in particular to lift previous quantifier instantiation approaches (e.g., [26]) and local search procedures (e.g., [25]) for bit-vectors to floating-points. We also plan to extend and use our synthesis framework for related challenging synthesis tasks, such as finding conditions under which more complex constraints have solutions, including those having multiple occurrences of a variable to solve for. Our synthesis framework is agnostic to theories and can be used for any sort with a small finite domain. It can thus be leveraged also for solutions to quantified bit-vector constraints. Finally, we would like to establish formal proofs of correctness of our invertibility conditions that are independent of floating-point formats.

#### **References**

1. Alur, R., et al.: Syntax-guided synthesis. In: Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, 20–23 October 2013, pp. 1–8. IEEE (2013). http://ieeexplore.ieee.org/document/6679385/


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Numerically-Robust Inductive Proof Rules for Continuous Dynamical Systems**

Sicun Gao<sup>1</sup>, James Kapinski<sup>2</sup>, Jyotirmoy Deshmukh<sup>3</sup>, Nima Roohi1(B) , Armando Solar-Lezama<sup>4</sup>, Nikos Arechiga<sup>5</sup>, and Soonho Kong<sup>5</sup>

<sup>1</sup> University of California, San Diego, La Jolla, USA {sicung,nroohi}@ucsd.edu <sup>2</sup> Toyota R&D, Gardena, USA jim.kapinski@toyota.com <sup>3</sup> University of Southern California, Los Angeles, USA jyotirmoy.deshmukh@usc.edu <sup>4</sup> Massachusetts Institute of Technology, Cambridge, USA asolar@csail.mit.edu <sup>5</sup> Toyota Research Institute, Cambridge, USA {nikos.arechiga,soonho.kong}@tri.global

**Abstract.** We formulate numerically-robust inductive proof rules for unbounded stability and safety properties of continuous dynamical systems. These induction rules robustify standard notions of Lyapunov functions and barrier certificates so that they can tolerate small numerical errors. In this way, numerically-driven decision procedures can establish a sound and relative-complete proof system for unbounded properties of very general nonlinear systems. We demonstrate the effectiveness of the proposed rules for rigorously verifying unbounded properties of various nonlinear systems, including a challenging powertrain control model.

#### **1 Introduction**

Infinite-time stability and safety properties of continuous dynamical systems are typically established via inductive arguments over continuous time. For instance, proving stability of a dynamical system is similar to proving termination of a program. A system is stable at the origin in the sense of Lyapunov, if one can find a Lyapunov function (essentially a ranking function) that is everywhere positive except for reaching exactly zero at the origin, and never increases over time along the direction of the system dynamics [11]. Likewise, proving unbounded safety of a dynamical system requires one to find a barrier function (or differential invariant [19]) that separates the system's initial state from the unsafe regions, and whenever the system states reach the barrier, the system dynamics always points towards the safe side of the barrier [21]. In both cases, once a candidate certificate (Lyapunov or barrier functions) is proposed, the verification problem is reduced to checking the validity of a universally-quantified first-order formula over real-valued variables. The standard approaches for the validation step use symbolic quantifier elimination [4] or Sum-of-Squares techniques [17,18,24]. However, these algorithms are either extremely expensive or numerically brittle. Most importantly, they can not handle systems with nonpolynomial nonlinearity, and thus fall short of a general framework for verifying practical systems of significant complexity.

The standard approach of checking invariance conditions in program analysis is to use Satisfiability Modulo Theories (SMT) solvers [16]. However, to check the inductive conditions for nonlinear dynamical systems, one has to solve nonlinear SMT problems over real numbers, which are highly intractable or undecidable [23]. Recent work on numerically-driven decision procedures provides a promising direction to bypass this difficulty [5,6]. They have been used for many bounded-time verification and synthesis problems for highly nonlinear systems [12]. However, the fundamental challenge with using numerically-driven methods in inductive proofs is that numerical errors make it impossible to verify the induction steps in the standard sense. Take the Lyapunov analysis of stability properties as an example. A dynamical system is stable if there exists a function that vanishes *exactly* at the origin and its derivatives *strictly* decreases over time. Since *any* numerical error blurs the difference between strict and non-strict inequality, one can conclude that numerically-driven methods are not suitable for verifying these strict constraints. However, proving a system is stable within an arbitrarily tiny neighborhood around the origin is all we really need in practice. Thus, there is a discrepancy between what the standard theory requires and what is needed in practice, or what can be achieved computationally. To bridge this gap, we need to rethink about the fundamental definitions.

In this paper, we formulate new inductive proof rules for continuous dynamical systems for establishing robust notions of stability and safety. These proof rules are practically useful and computationally certifiable in a very general sense. For instance, for stability, we define the notion of ε-stability that requires the system to be stable within an ε-bounded distance from the origin, instead of exactly at the origin. When ε is small enough, ε-stable systems are practically indistinguishable from stable systems. We then define the notion of ε-Lyapunov functions that are sufficient for establishing ε-stability. We then rigorously prove that the ε-Lyapunov conditions are numerically stable and can be correctly determined by δ-complete decisions procedures for nonlinear real arithmetic [7]. In this way, we can rely on various numerically-driven SMT solvers to establish a sound and relative-complete proof systems for unbounded stability and safety properties of highly nonlinear dynamical systems. We believe these new definitions have eliminated the core difficulty for reasoning about infinite-time properties of nonlinear systems, and will pave the way for adapting a wide range of automated methods from program analysis to continuous and hybrid systems. In short, the paper makes the following contributions:


strict contraction, and the latter relies on reachable-set computation to guarantee bounded escape.

– We prove that δ-complete decision procedures provide a sound and relativecomplete proof system for the proposed numerically-robust induction rules, in both Sects. 3 and 4.

We demonstrate the effectiveness of the proposed methods on various nonlinear systems in Sect. 5. Section 2 covers the basic definitions and Sect. 6 concludes the paper.

**Related Work.** Several lines of work have proposed relaxed and practical notions to capture the spirit of the stability requirements. Early work from the 1960s introduced practical stability, which defined bounds on system behaviors over finite time horizons [2,14,26,27]. These methods can show whether a system leaves a safe set or enters a goal set over a finite time horizon based on Lyapunov-like functions. Stability defined in this sense is equivalent to estimating the reachable set over a finite time horizon. Thus, the shortcoming is that it may not capture the desired behavior of the system over unbounded time. Similarly, notions of boundedness and ultimate boundedness specify limits on the system behaviors [11]. Boundedness specifies whether the system remains within a given bounded region. Ultimate boundedness specifies that the system eventually returns to the given bounded region. These properties can be established based on Lyapunov-like conditions. Related notions have been generalized to switched systems [29,30]. Also, the related notion of region stability defines systems that eventually enter and remain within a specified set [20]. We present stability concepts that unify and extend the above notions. A related relaxation of the traditional notions of stability includes *almost* Lyapunov functions [15], which allow the strict stability conditions to be neglected in a region near the equilibrium point. The challenge of applying this technique in practice is that the size and shape of the neglected region are not specified a priori, so a constructive technique for specifying a stability region is not straightforward. Our work is related to efforts to construct and check robust barrier certificates using Lyapunov-like functions to ensure that controllers satisfy safety constraints [28]. This work provides a framework in which to specify analytic constraints on controller behaviors. By contrast, our work focuses on providing constraints that can be checked fully automatically. Our notion of ε-barrier functions is closely related to t-barrier certificates from [1], though we choose to focus on distance bounds from the barrier (ε) rather than time bounds that indicate how long it takes for behaviors to re-enter the barrier once it has left (t).

#### **2 Background**

#### **2.1 Dynamical Systems**

Throughout the paper, we use the following definition of an n-dimensional autonomous dynamical system:

$$\frac{\mathrm{d}x(t)}{\mathrm{d}t} = f(x(t)), \; x(0) \in \mathrm{init} \text{ and } \forall t \in \mathbb{R}\_{\geq 0}, x(t) \in D,\tag{1}$$

where an open set <sup>D</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> is the state space, init <sup>⊆</sup> <sup>D</sup> is a set of initial states, and <sup>f</sup> : <sup>D</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> is a vector field specified by Lipschitz-continuous functions on each dimension. For notational simplicity, *all variable and function symbols can represent vectors*. When vectors are used in logic formulas, they represent conjunctions of the formulas for each dimension. For instance, when x = (x1,...,xn), we write x = 0 to denote the formula x<sup>1</sup> = 0 ∧···∧ x<sup>n</sup> = 0. For any system defined by (1), we write its solution function as

$$F: D \times \mathbb{R}\_{\geq 0} \to \mathbb{R}^n, \ F(x(0), t) = x(0) + \int\_0^t f(x(s)) ds. \tag{2}$$

Note that F usually does not have an analytic form. However, since f is Lipschitzcontinuous, F exists and is unique. We will often use Lie derivatives to measure the change of a scalar function along the flow defined by another vector field:

**Definition 1 (Lie Derivative).** *Let* <sup>f</sup> : <sup>D</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> *define a vector field. Write the* i *th component of* <sup>f</sup> *as* <sup>f</sup>i*. Let* <sup>V</sup> : <sup>D</sup> <sup>→</sup> <sup>R</sup> *be a differentiable scalar function. The Lie derivative of* <sup>V</sup> *over* <sup>f</sup> *is defined as* <sup>∇</sup><sup>f</sup><sup>V</sup> (x) = <sup>n</sup> i=1 ∂V ∂x<sup>i</sup> fi.

## **2.2 First-Order Language over the Reals** *L*<sup>R</sup>*<sup>F</sup>*

We will make extensive use of first-order formulas over real numbers with Type 2 computable functions [25] to express and infer properties of nonlinear dynamical systems. Definition 2 introduces the syntax of these formulas.

**Definition 2 (Syntax of** L<sup>R</sup><sup>F</sup> **).** *Let* F *be the class of all Type 2 computable functions over real numbers. We define:*

$$\begin{aligned} t &::= x\_i \mid f(t(x)), \text{ where } f \in \mathcal{F}, \text{ possibly constant};\\ \varphi &::= \top \mid \bot \mid t(x) > 0 \mid t(x) \ge 0 \mid \varphi \land \varphi \mid \varphi \lor \varphi \mid \exists x\_i \varphi \mid \forall x\_i \varphi. \end{aligned}$$

We regard ¬ϕ as an operation that is defined inductively as usual. For instance, ¬(t > 0) is defined as −t ≥ 0, and ¬(∃xiϕ) is defined as ∀x<sup>i</sup>¬ϕ. For any <sup>L</sup><sup>R</sup><sup>F</sup> terms <sup>u</sup> and <sup>v</sup>, variable <sup>x</sup>, and <sup>L</sup><sup>R</sup><sup>F</sup> predicate <sup>ϕ</sup>, we write <sup>∃</sup>[u,v] xϕ and <sup>∀</sup>[u,v] xϕ to denote ∃x(u ≤ x ∧ x ≤ v ∧ ϕ) and ∀x((u ≤ x ∧ x ≤ v) → ϕ), respectively, which applies to open intervals too. Next, Definition 3 introduces syntactic perturbation of formulas in L<sup>R</sup><sup>F</sup> .

**Definition 3 (**δ**-Strengthening and Robust Formulas** [7]**).** *Let* <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> *be arbitrary. Let* ϕ *be an arbitrary* L<sup>R</sup><sup>F</sup> *formula. The* δ*-strengthening of* ϕ*, denoted by* ϕ<sup>+</sup><sup>δ</sup>*, is obtained from* ϕ *by replacing every atomic predicate of the form* t(x) > 0 *and* t(x) ≥ 0 *with* t(x) − δ > 0 *and* t(x) − δ ≥ 0*, respectively. We say* ϕ *is* δ -robust *iff* <sup>ϕ</sup><sup>+</sup><sup>δ</sup> <sup>↔</sup> <sup>ϕ</sup>*.*

**Definition 4 (**δ**-Complete Decision Procedures** [7]**).** *Let* S *be a class of* L<sup>R</sup><sup>F</sup> *-sentences. We say a decision procedure is* δ*-complete over* S *iff for any* ϕ ∈ S*, the procedure correctly returns one of the following answers:*

*–* true : ϕ *is true. –* δ*-*false : ϕ<sup>+</sup><sup>δ</sup> *is false.*

*When the two cases overlap, either decision can be returned.*

It follows that if ϕ is δ-robust, then a δ-complete decision procedure can correctly determine the truth value of ϕ.

#### **3 Robust Proofs for Stability**

We first focus on stability. We will define the notion of ε-stability, as a relaxation of the standard Lyapunov stability, and then define ε-Lyapunov functions, which are sufficient for proving ε-stability in a robust way.

#### **3.1 Stability and Lyapunov Functions**

Conventionally, ε and δ are used to best highlight the connection with ε-δ conditions for continuity. We will mostly reserve the use of ε for defining conditions that are robust under ε-bounded numerical errors. Thus, we replace ε by τ in the standard definitions to avoid confusion.

**Definition 5 (Stability).** *We say the system in (1) is stable at the origin in the sense of Lyapunov, iff for any* τ *-ball neighborhood of the origin, there exists a* δ*-ball around the origin, such that, if the system starts within the* δ*-ball then it never escapes the* τ *-ball. We capture the definition by the following* L<sup>R</sup><sup>F</sup> *-formula:*

$$\mathsf{Stable}(f) \equiv\_{df} \forall^{(0,\infty)} \tau \exists^{(0,\infty)} \delta \forall^D x\_0 \forall^{[0,\infty)} t \left( \|x\_0\| < \delta \to \|F(x\_0, t)\| < \tau \right)$$

**Definition 6 (Lyapunov Function).** *Consider a dynamical system given in the form of (1), and let* <sup>V</sup> : <sup>D</sup> <sup>→</sup> <sup>R</sup> *be a differentiable function. We say* <sup>V</sup> *is a non-strict Lyapunov function for the system, iff the following predicate is true:*

$$\mathsf{LF}(f, V) \equiv\_{df} (V(0) = 0) \land (f(0) = 0) \land \forall^D \forall^{\{0\}} x \left( V(x) > 0 \land \nabla\_f V(x) \le 0 \right)$$

**Proposition 1.** *For any dynamical system defined by* f*, if there exists a Lyapunov function* V *, then the system is stable. Namely,* LF(f,V ) → Stable(f)*.*

#### **3.2 Epsilon-Stability**

The standard definitions of stability requires a system to stabilize within arbitrarily small neighborhoods around the origin. However, very small neighborhoods are practically indistinguishable from the origin. Thus, it is practically sufficient to prove that a system is stable within some sufficiently small neighborhood. We capture this intuition by making a minor change to the standard definition, by simply putting a lower bound ε on the τ parameter in Definition 5. As a result, the system is required to exhibit the same behavior as standard stable systems outside the ε-ball, but can behave arbitrarily within the ε-ball (for instance, oscillate around the origin). The formal definition is as follows:

**Fig. 1.** Standard and ε-relaxed notions of stability and Lyapunov functions

**Definition 7 (Epsilon-Stability).** *Let* <sup>ε</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> *be arbitrary. We say a dynamical system in (1) is* ε*-stable at the origin in the sense of Lyapunov, iff it satisfies the following condition:*

$$\mathsf{Stable}\_{\varepsilon}(f) \equiv\_{df} \forall^{[\varepsilon,\infty)} \tau \exists^{(0,\infty)} \delta \forall^{D} x\_{0} \forall^{[0,\infty)} t \Big( \|x\_{0}\| < \delta \to \|F(x\_{0},t)\| < \tau \Big).$$

*In words, for any* τ ≥ ε*, there exists* δ *such that all trajectories that start within the* δ*-ball will stay within a* τ *-ball around the origin.*

Note that the only difference with the standard definition is that τ is *bounded from below* by a positive ε instead of 0. The definition is depicted in Fig. 1c, which shows the difference with the standard notion in Fig. 1a. Since the only difference with the standard definition is the lower bound on the universally quantified τ , it is clear that ε-stability is strictly weaker than standard stability.

**Proposition 2.** *For any* <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+*,* Stable(f) <sup>→</sup> Stableε(f)*.*

Thus, any system that is stable in the standard definition is also ε-stable for any <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+. On the other hand, one can always choose small enough <sup>ε</sup> such that an ε-stable system is practically indistinguishable from stable systems in the standard definition.

#### **3.3 Epsilon-Lyapunov Function**

We now define the corresponding notion of Lyapunov function that can be used for proving ε-stability. The robustness problem in the standard definition comes from the singularity of the origin. With the relaxed notion of stability, the system may oscillate within some ε-neighborhood of the origin. With the relaxation, we now have room for constructing a few nested neighborhoods that can trap the trajectories in a way that is robust under sufficiently small perturbations. To achieve this, we make use of balls of different sizes, as shown in the following definition. We write B<sup>ε</sup> to denote open ε-balls around the origin.

**Definition 8 (Epsilon-Lyapunov Functions).** *Let* <sup>V</sup> : <sup>D</sup> <sup>→</sup> <sup>R</sup> *be a differentiable scalar function defined for the system in (1), and let* <sup>ε</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> *be an arbitrary value. We say* V *is an* ε*-Lyapunov function for the system, iff it satisfies the following conditions:*


*In sum, the three conditions can be expressed with the following* L<sup>R</sup><sup>F</sup> *-formula:*

LFε(f,V ) <sup>≡</sup>df <sup>∃</sup>(0,ε) ε <sup>∃</sup>(0,∞) <sup>α</sup>∃(0,α) <sup>β</sup>∃(0,∞) γ <sup>∀</sup><sup>D</sup>\B<sup>ε</sup> <sup>x</sup> V (x) ≥ α ∧ ∀<sup>B</sup>ε- x V (x) ≤ β ∧∀<sup>D</sup>\Bε- x ∇<sup>f</sup>V (x) ≤ −γ 

It is important to note that ε , α, β, and γ, are not fixed constants, but existentially quantified variables. Thus the condition can hold true for infinitely many values of these parameters, which is critical to robustness. The only free variable in the formula is ε, used in B<sup>ε</sup> and the bound for ε . Note also that neither of LFε(f,V ) and the standard definition LF(f,V ) implies the other.

*Remark 1.* The logical structure of LFε(f,V ) is seemingly more complex than the standard Lyapunov conditions in Definition 6 because of the extra existential quantification. In Theorem 3, we show that it does not add computational complexity in checking the conditions.

The key result is that the conditions for an ε-Lyapunov function are sufficient for establishing ε-stability.

**Theorem 1.** *If there exists an* ε*-Lyapunov function* V *for a dynamical system defined by* f*, then the system is* ε*-stable. Namely,* LFε(f,V ) → Stableε(f)*.*

*Proof.* Let <sup>τ</sup> <sup>≥</sup> <sup>ε</sup> be arbitrary, and let α, γ <sup>∈</sup> <sup>R</sup>+, <sup>β</sup> <sup>∈</sup> (0, α), and <sup>ε</sup> <sup>∈</sup> (0, ε) be as specified by the definition of LFε(f,V ). Let x<sup>0</sup> ∈ B<sup>ε</sup> be an arbitrary point. For any <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥0, let <sup>x</sup>(t) := <sup>F</sup>(x0, t) be the system state as defined in (2). We use contradiction to prove for any <sup>t</sup> <sup>∈</sup> <sup>R</sup>+, inequality x(t) < ε <sup>≤</sup> <sup>τ</sup> holds. Since ε < ε and F(x0, .) is continuous, we know t<sup>1</sup> and t<sup>2</sup> with the following conditions exists (∂B<sup>ε</sup>and ∂B<sup>ε</sup> are boundaries of the corresponding balls):

$$0 \le t\_1 < t\_2 \le t, \quad x(t\_1) \in \partial \mathcal{B}\_{\varepsilon'}, \quad x(t\_2) \in \partial \mathcal{B}\_{\varepsilon}, \quad \forall^{(t\_1, t\_2)} t' \left( x(t') \in \mathcal{B}\_{\varepsilon} \backslash \mathcal{B}\_{\varepsilon'} \right).$$

We know V (x(t1)) ≤ β<α ≤ V (x(t2)) and hence V (x(t1)) < V (x(t2)) are both true; however, this is in contradiction with the mean value theorem and the fact that <sup>B</sup><sup>ε</sup> <sup>⊂</sup> <sup>D</sup> and <sup>∀</sup><sup>D</sup>\Bε x ∇<sup>f</sup>V (x) < −γ . *Remark 2.* Proof of Theorem 1 shows that once state of the system enters Bε- , it never leaves Bε. However, it would be still possible for the state to leave Bε- . One the other hand, since closure of B<sup>ε</sup> \ Bε is bounded, and for every x in this area, V is continuous at x and ∇fV (x) ≤ −γ, no trajectory can be trapped in the closure of B<sup>ε</sup> \ Bε- . Therefore, even though state of the system might leave Bε-, it will visit inside of this ball infinitely often.

*Example 1.* Consider the time-reversed Van der Pol system given by the following dynamics. Figure 3 shows the vector field of this system around the origin.

$$
\begin{bmatrix}
\dot{x}\_1\\ \dot{x}\_2
\end{bmatrix} = \begin{bmatrix}
\end{bmatrix}
$$

A Lyapunov function z<sup>T</sup> P z, where z<sup>T</sup> is [x1, x2, x<sup>2</sup> 1, x1x2, x<sup>2</sup> 2, x<sup>3</sup> 1, x<sup>2</sup> <sup>1</sup>x2, x1x<sup>2</sup> 2, x<sup>3</sup> <sup>2</sup>], and P is the 9 × 9 constant matrix given in [8], is a 6-degree polynomial that can be obtained using simulation-guided techniques from [10]. Using dReal [9] with δ := 10−<sup>25</sup> and the Euclidean norm, we are able to prove that z<sup>T</sup> P z is a 10−<sup>12</sup>-Lyapunov function. Table 1 lists the parameters used for this proof.

#### **3.4 Automated Proofs with Delta-Decisions**

We now prove that unlike the conventional conditions, the new inductive proof rules are numerically robust. It follows that δ-decision procedures provide a sound and relative-complete proof system for establishing the conditions in the following sense:


To prove these properties, the key fact is that the continuity of the functions in the induction conditions ensures that there is room for numerical errors in the conditions. Consequently, the formulas allow δ-perturbations in their parameters. This is captured by Lemma 1, and the proof is given in [8].

**Lemma 1.** *For any* <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+*, there exists* <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> *such that* LFε(f,V ) *is* <sup>δ</sup>*-robust.*

Note that if a formula φ is δ-robust then for every δ ∈ (0, δ), φ is δ -robust as well. The soundness and relative-completeness then follow naturally.

**Theorem 2 (Soundness).** *If a* δ*-complete decision procedure confirms that* LFε(f,V ) *is* true *then* V *is indeed an* ε*-Lyapunov function, and* f *is* ε*-stable.*

*Proof.* Using Definition 4, we know LFε(f,V ), exactly as specified in Definition 8, is true. Therefore, V is ε-Lyapunov. Using Theorem 1, f is ε-stable. **Theorem 3 (Relative Completeness).** *For any* <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+*, if* LFε(f,V ) *is true then there exists* <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> *such that any* <sup>δ</sup>*-complete decision procedure must return that* LFε(f,V ) *is* true*.*

*Proof.* Fix an arbitrary <sup>ε</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> for which LFε(f,V ) is true. Let <sup>φ</sup> := LFε(f,V ), and using Lemma 1, let <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> be such that <sup>φ</sup> is <sup>δ</sup>-robust. Since <sup>φ</sup> is true, we conclude φ<sup>+</sup><sup>δ</sup> is true as well. Using Definition 4, no δ-complete decision procedure can return δ-false for φ.

We remark that the quantifier alternation used in Definition 8 can be eliminated without extra search steps. It confirms that we only need to run SMT solving to handle the universally quantified subformula. The reason is that the α, β, and γ parameters can be found by estimating the range of V (x) and ∇<sup>f</sup>V (x) in the different neighborhoods. In fact, we can rewrite LFε(f,V ) in the following way to eliminate the use of α, β, and γ:

$$\mathsf{LPF}\_{\varepsilon}(f, V) \leftrightarrow \exists^{(0, \varepsilon)} \varepsilon' \left( \sup\_{x \in \mathcal{B}\_{\varepsilon'}} V(x) < \inf\_{x \in D\backslash \mathcal{B}\_{\varepsilon}} V(x) \land \sup\_{x \in D\backslash \mathcal{B}\_{\varepsilon'}} \nabla\_f V(x) < 0 \right)$$

Note that in this form the universal quantification is implicit in the sup and inf operators. In this way, the formula is existentially quantified on only ε , which can then be handled by binary search. This is an efficient way of checking the conditions in practice. We also remark that without this method, the original formulation with multiple parameters can be directly solved as ∃∀-formulas as well using more expensive algorithms [13].

#### **4 Robust Proofs for Safety**

In this section, we define two types of ε-barrier functions that are robust to numerical perturbations.

Proving unbounded safety requires the use of barrier functions. The idea is that if one can find a barrier function that separates initial conditions from the set of unsafe states, such that no trajectories can cross the barrier from the safe to the unsafe side, then the system is safe. Here we use a formulation similar to the that of Prajna [21]. The standard conditions on barrier functions include constraints on the vector field of the system at the exact boundary of the barrier set, which introduces robustness problems. We show that it is possible to avoid these problems using two different formulations, which we call Type 1 and Type 2 ε-barrier functions. Type 1 ε-barrier functions strengthen the original definition and requires strict contraction of the barrier. Instead of only asking the system to be contractive exactly on the barrier's border, we force it to be contractive when reaching any state within a small distance from the border. Type 2 ε-barrier functions allow the system to escape the barrier for a controllable distance and a limited period of time. It should then return to the interior of the safe region. Type 1 ε-barriers can be seen as a subclass of Type 2 ε-barriers. The benefit for allowing bounded escape is that the shape of the barrier no longer needs to be an invariant set, which can be particularly helpful when the shape of the system invariants cannot be determined or expressed symbolically. The downside to Type 2 ε-barriers is that checking the corresponding conditions requires integration of the dynamics, which can be expensive but can still be handled by δ-complete decision procedures. The intuition behind the two definitions is shown in Fig. 2 and will be explained in detail in this section.

#### **4.1 Safety and Barrier Functions**

Before formally introducing robust safety and ε-barrier functions, we define the safety and barrier functions first. It is easy to see that the robustness problem with the barrier functions is similar to that of Lyapunov functions: if the boundary is exactly separating the safe and unsafe regions then the inductive conditions are not robust, since deviations in the variables by even a small amount from the barrier will make it impossible to complete the proof.

**Definition 9 (Safety).** *Let* <sup>B</sup> : <sup>D</sup> <sup>→</sup> <sup>R</sup> *be a scalar function defined for the system in (1). We say* B ≤ 0 *defines a safe (or forward invariant) set for the system, iff the following formula is true:*

$$\mathsf{Safe}(f, \mathsf{init}, B) \equiv\_{df} \forall^D x\_0 \forall^{[0,\infty)} t \left(\mathsf{init}(x\_0) \to B(F(x\_0, t)) \le 0\right).$$

**Definition 10 (Barrier Function).** *Let* <sup>B</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup> *be a differentiable scalar function defined for the system in (1). We say* B *is a barrier function for the system, iff the following formula is true:*

$$\text{Barrier}(f, \text{init}, B) \equiv\_{df} \forall^D x \left( \left( \text{init}(x) \to B(x) \le 0 \right) \land \left( B(x) = 0 \to \nabla\_f B(x) < 0 \right) \right)$$

**Proposition 3.** Barrier(f, init, B) → Safe(f, init, B)*.*

**Fig. 2.** Type 1 and Type 2 ε-Barriers

#### **4.2 Type 1: Strict Contraction**

In the standard definition, the boundary of the barrier set is typically a manifold defined by equality, which is not numerically robust. To avoid this problem, we need the barrier boundary to be *belt-shaped* in the sense that there is a clear gap between the safe and unsafe regions. The idea is as shown in Fig. 2c: we need a second and stronger barrier defined by B = −ε for some reasonable ε, so that the system is clearly separated from B = 0. The formal definition is as follows.

**Definition 11 (**ε**-Barrier Certificates).** *Let* <sup>ε</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> *be arbitrary. A differentiable scalar function* <sup>B</sup> : <sup>D</sup> <sup>→</sup> <sup>R</sup> *is an* <sup>ε</sup>*-barrier function iff the following conditions are true:*


*Formally, the condition is defined as*

$$\begin{aligned} \mathsf{Barrier}\_{\varepsilon}(f, \mathsf{init}, B) & \equiv\_{df} \forall^{D} x \Big(\mathsf{init}(x) \to B(x) \le -\varepsilon\Big) \\ & \wedge \exists^{(0,\infty)} \gamma \forall^{D} x \Big(B(x) = -\varepsilon \to \nabla\_{f} B(x) \le -\gamma\Big) \end{aligned}$$

It should be intuitively clear from the definition that the existence of ε-barrier functions is sufficient for establishing invariants and safety properties. The new requirement is that the system stays robustly within the barrier, by the area defined by −ε ≤ B(x) ≤ 0.

## **Theorem 4.** *For any* <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+*,* Barrierε(f, init, B) <sup>→</sup> Safe(f, init, B)*.*

*Proof.* Assume Barrierε(f, init, B) is true. It is easy to see Barrier(f, init, B+ε), as specified in Definition 10, is also true. Therefore, using Proposition 3, we know Safe(f, init, B + ) and hence Safe(f, init, B) are both true.

It is clear that there is room for numerically perturbing the size of the area and still obtaining a robust proof. The proof is similar to the one for Lemma 1 as shown in [8].

**Theorem 5.** *For any* <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+*, there exists* <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> *such that* Barrierε(f, init, B) *is a* δ*-robust formula.*

*Example 2 (Type 1* ε*-Barrier for timed-reversed Van der Pol).* Consider the time-reversed Van der Pol system introduced in Example 1. We use the same example to demonstrate the effect of numerical errors in proving barrier certificates. The level sets of the Lyapunov functions in the stable region are barrier certificates; however, for the barriers that are very close to the limiting cycle, numerical sensitivity becomes a problem. In experiments, when ε = 10−<sup>5</sup> and δ = 10−<sup>4</sup>, we can verify that the level set z<sup>T</sup> P z = 90, is a Type 1 εbarrier. Table 2 lists parameters used in this proof. Figure 3 (Left) shows the direction field for the timed-reversed Van der Pol dynamics, the border of the set <sup>z</sup><sup>T</sup> P z <sup>≤</sup> 90, which we prove is a type 1 <sup>ε</sup>-barrier, and the boundary of set <sup>z</sup><sup>T</sup> P z <sup>≤</sup> 110, which is clearly not a barrier, since it is outside of the limit cycle.

**Fig. 3.** (Left) Van der pol example (Right) Type 2 barrier example

The conditions for ε-Lyapunov and ε-barrier functions look very similar, but there is an important difference. In the case of Lyapunov functions, we do not evaluate the Lie derivative of the balls. Thus, the balls do not define barrier sets. On the other hand, the level sets of Lyapunov functions always define barriers.

*Remark 3.* The ε-barrier functions can also be used as a sufficient condition for ε-stability, if a barrier can be found within the ε-ball required in ε-stability.

*Remark 4.* A technical requirement for proving robustness of the ε-barrier conditions is that ¬init defines a simple set that can be over-approximated, such that for every <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+, there is <sup>δ</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> such that for any point that satisfies <sup>¬</sup>init<sup>+</sup><sup>δ</sup> there is an <sup>ε</sup>-close point that satisfies <sup>¬</sup>init. A sufficient condition for this restriction is that init be of the form ( <sup>i</sup> a<sup>i</sup> ≤ x<sup>i</sup> ≤ bi) → ϕ(x), where <sup>a</sup>i, b<sup>i</sup> <sup>∈</sup> <sup>Q</sup> are arbitrary constants, and <sup>ϕ</sup> is a quantifier-free formula with only strict inequalities [22].

#### **4.3 Type 2: Bounded Escape**

We now introduce the second set of conditions for establishing ε-invariant sets. This set of conditions can be used only when the ε-variations are considered. This notion is inspired by the notion of k-step invariants [3] for discrete-time systems. The ε-margin that we allow at the boundary of the invariants allows us to exploit more techniques. Using reachable set computation, we can directly check if all states stay within the barrier set at each step. To ensure that the conditions are inductive and useful, we need to impose the following two requirements:

– (Contraction) Similar to the strengthening in barrier certificates, we require that the system does not *sit at the boundary*: the dynamics at the boundary should be contracting. The difference with Type 1 ε-barriers is that, this condition is not imposed through the vector field on the boundary. Instead, it is a reachability condition: after some amount of time, all states should return to the interior of an appropriate set.

– (Bounded Escape) Before reaching back to the invariant set, we allow the system to step outside the invariant, but only up to a bounded distance from the boundary.

The intuition is depicted in Fig. 2d. In the formal definition, we parameterize the conditions with the time for contraction and the maximum deviation from the invariant set, as follows.

**Definition 12 (Type 2 Barrier Functions).** *Let* T,ε <sup>∈</sup> <sup>R</sup><sup>+</sup> *be arbitrary. We say a continuous scalar function* B *defines a* (T,ε)*-elastic barrier function, iff the following conditions hold:*


*In all, we define the conditions with the following formula*

$$\begin{aligned} \mathsf{Barier}\_{T,\varepsilon}(f,\mathsf{init},B) & \equiv\_{df} \forall^D x\Big(\mathsf{init}(x) \to B(x) \le -\varepsilon\Big) \\ & \wedge \exists^{(0,\varepsilon)}\varepsilon \, ^\ast \forall^D x \forall^{[0,T]} t\Big((B(x) = -\varepsilon) \to B(F(x,t)) \le -\varepsilon^\ast\Big) \\ & \qquad \wedge \exists^{(\varepsilon,\infty)}\varepsilon \, ^\ast \forall^D x\Big((B(x) = -\varepsilon) \to B(F(x,T)) \le -\varepsilon'\Big) \end{aligned}$$

Theorem 6, shows that conditions in Definition 12 ensure that the system never leaves the invariant B ≤ 0. The key is the second condition: induction works because all states come back to the interior of the set defined by B ≤ −ε. With the third condition only, we cannot perform induction because the set may keep growing.

## **Theorem 6.** *For any* T,ε <sup>∈</sup> <sup>R</sup>+*,* BarrierT ,ε(f, init, B) <sup>→</sup> Safe(f, init, B)*.*

*Proof.* For the purpose of contradiction, suppose starting from x<sup>0</sup> ∈ init, the system is unsafe. Using continuity of the barrier B and the solution function F, let <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> be a time at which <sup>B</sup>(x(t)) = 0, where <sup>x</sup>(t) is by definition <sup>F</sup>(x0, t). By the 1st property in Definition 12, we know <sup>B</sup>(x0) ≤ −ε < 0. Using continuity of B and F, let t ∈ [0, t) be the supremum of all times at which B(x(t )) = −ε. By the 3rd property in Definition 12, we know <sup>t</sup>−<sup>t</sup> > T, and by the 2nd property in Definition 12, we know B(x(t + T)) ≤ −ε < −ε. Using continuity of B and F, we know there is a time t ∈ (t + T,t) at which B(x(t )) = −ε. However, this is in contradiction with t being the supremum.

**Theorem 7.** *For any* <sup>ε</sup> <sup>∈</sup> <sup>R</sup>+*, there exists* <sup>δ</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> *such that* BarrierT ,ε(f, init, B) *is a* δ*-robust formula.*

*Example 3.* We use this example to show how Type 2 ε-barriers can be used to establish safety. Consider the following system.

$$
\begin{bmatrix}
\dot{x}\_1\\\dot{x}\_2
\end{bmatrix} = \begin{bmatrix}
\end{bmatrix} \begin{bmatrix}
x\_1\\x\_2
\end{bmatrix}
$$

Let init be the set {x | −0.1 ≤ x<sup>1</sup> ≤ 0.1, −0.1 ≤ x<sup>2</sup> ≤ 0.1}, and let U, the unsafe set, be the set {x | −2.0 ≤ x<sup>1</sup> ≤ −1.1, −2.0 ≤ x<sup>2</sup> ≤ −1.1}. The system is stable and safe with respect to the designated unsafe set. However, the safety cannot be shown using any invariant of the form B(x) := x<sup>2</sup> <sup>1</sup> + x<sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>c</sup> <sup>≤</sup> 0, where <sup>c</sup> <sup>∈</sup> <sup>Q</sup><sup>+</sup> is a constant, in the standard definition. This is because the vector field on the boundary of such sets do not satisfy the inductive conditions. Nevertheless, we can show that for c = 1, B(x) is a Type 2 ε-barrier. The dReal query verifies the conditions with ε = 0.1. Since U(x) → B(x) > and init(x) → B(x) < −ε , we know that the system cannot reach any unsafe states. Figure 3 (Right), illustrates the example. The green set at the center represents init, and the red set represents unsafe set U. The B(x) = 0 level set is not invariant, as evidenced in the figure by the forward images at t = 0.14 and t = 0.28 leaving the set; however, as the dReal query proves, the reachable set over 0 ≤ t ≤ 10 does not leave the B(x)=1.0 level set and is completely contained in the B(x) = −0.1 level set by t = 0.4. Since U(x) → B(x) > 1.0 and init(x) → B(x) < −0.1, then the system cannot reach any state in U.

#### **5 Experiments**

In this section, we show examples of nonlinear systems that can be verified to be ε-stable or safe with ε-barriers.

**Table 1.** Results for the ε-Lyapunov functions. Each Lyapunov function is of the form z*<sup>T</sup>* P z, where z is a vector of monomials over the state variables. We report the constant values satisfying the ε-Lyapunov conditions, and the time that verification of each example takes (in seconds).


Table 1 contains parameters we use to verify requirements of Definition 8 for ε-Lyapunov functions in our examples. Table 2 contains parameters we use to verify requirements of Definition 11 for Type 1 ε-barrier functions in our examples. The ε-Lyapunov functions in these examples are of the form V (x) := z<sup>T</sup> P z, where z is a vector of products of the state variables and P is a constant

**Table 2.** Results for the ε-barrier functions. Each barrier function B(x) is of the form <sup>z</sup>*<sup>T</sup>* P z <sup>−</sup> , where <sup>z</sup> is a vector of monomials over <sup>x</sup>. We indicate the highest degree of the monomials used in z, the size of the P, the level used for each barrier function, and the value of ε and γ used to the check ∇*f*B(x) < −γ.


matrix obtained using simulation-guided techniques from [10]. All the P matrices are given in [8].

**Time-Reversed Van der Pol.** The time-reversed Van der Pol system has been used as an example in the previous sections. Figure 3 (Left) shows the direction field of this system around the origin. Using dReal with δ := 10−<sup>25</sup>, we are able to establish a 10−<sup>12</sup>-Lyapunov function and a 10−<sup>5</sup>-barrier function.

**Normalized Pendulum.** A standard pendulum system has continuous dynamics containing a transcendental function, which causes difficulty for many techniques. Here, we consider a normalized pendulum system with the following dynamics, in which x<sup>1</sup> and x<sup>2</sup> represent angular position and velocity, respectively. In our experiment, using δ = 10−<sup>50</sup>, we can prove that function V := x<sup>T</sup> P x is ε-Lyapunov, where ε := 10−<sup>12</sup>.

$$
\begin{bmatrix}
\dot{x}\_1\\\dot{x}\_2
\end{bmatrix} = \begin{bmatrix}
x\_2\\-\sin(x\_1) - x\_2
\end{bmatrix} \tag{3}
$$

Using δ := 0.01, we are able to prove that for *any* value  ∈ [0.1, 10], the function <sup>B</sup>(x) := <sup>x</sup><sup>T</sup> P x<sup>−</sup>, with <sup>x</sup> being the system state, and <sup>P</sup> a constant matrix given in [8], is a Type 1 0.01-barrier function.

**Moore-Greitzer Jet Engine.** Next, we consider a simplified version of the Moore-Greitzer model for a jet engine. The system has the following dynamics, in which x<sup>1</sup> and x<sup>2</sup> are states related to mass flow and pressure rise.

$$
\begin{bmatrix}
\dot{x}\_1\\\dot{x}\_2
\end{bmatrix} = \begin{bmatrix}
\end{bmatrix} \tag{4}
$$

In our experiment, using δ = 10−<sup>20</sup> and z := [x<sup>2</sup> 1, x1x2, x<sup>2</sup> <sup>2</sup>, x1, x2] T , we can prove that function V := z<sup>T</sup> P z is ε-Lyapunov, where ε := 10−<sup>10</sup>.

Using dReal with δ := 0.1, we are able to prove that for *any* value  ∈ [1, 10], the function <sup>B</sup>(x) := <sup>z</sup><sup>T</sup> P z <sup>−</sup>, with <sup>x</sup> being the system state, <sup>z</sup> being the vector of monomials defined in the previous section, and P a constant matrix given in [8], is a Type 1 0.1-barrier function.

**Powertrain Control System.** Next, we consider a closed-loop model of a powertrain control (PTC) system for an automotive application. The system dynamics consist of four state variables, two associated with a plant and two for a controller. The plant models fuel and air dynamics of an internal combustion engine and the controller is designed to regulate the air-fuel (A/F) ratio within a given range of an optimal value, referred as stoichiometric value. Two states related to the plant represent the manifold pressure, p, and the ratio between actual A/F ratio and stoichiometric value, r. The two associated with the controller are the estimated manifold pressure, pest, and the internal state of the PI controller, i. The system is highly nonlinear, with the following dynamics

$$\begin{aligned} \dot{p} &= c\_1 \left( 2\dot{u}\_1 \sqrt{\frac{p}{c\_{11}} - \left( \frac{p}{c\_{11}} \right)^2} - \left( c\_3 + c\_4 c\_2 p + c\_5 c\_2 p^2 + c\_6 c\_2^2 p \right) \right) \\ \dot{r} &= 4 \left( \frac{c\_3 + c\_4 c\_2 p + c\_5 c\_2 p^2 + c\_6 c\_2^2 p}{c\_{13} \left( c\_3 + c\_4 c\_2 p\_{est} + c\_5 c\_2 p\_{est}^2 + c\_6 c\_2^2 p\_{est} \right) \left( 1 + i + c\_{14} \left( r - c\_{16} \right) \right)} - r \right) \\ \dot{p}\_{est} &= c\_1 \left( 2\dot{u}\_1 \sqrt{\frac{p}{c\_{11}} - \left( \frac{p}{c\_{11}} \right)^2} - c\_{13} \left( c\_3 + c\_4 c\_2 p\_{est} + c\_5 c\_2 p\_{est}^2 + c\_6 c\_2^2 p\_{est} \right) \right) \\ \dot{i} &= c\_{15} (r - c\_{16}) \end{aligned}$$

which followed the detailed description of the model and the constant parameter values in [10]. We verified that there exists a function of the form <sup>B</sup>(x) = <sup>z</sup><sup>T</sup> P z<sup>−</sup> 0.01 (z consist of 14 monomials with a maximum degree of 2), where ∇<sup>f</sup>B(x) < −γ, when B(x) = −ε.

#### **6 Conclusion**

We formulated new inductive proof rules for stability and safety for dynamical systems. The rules are numerically robust, making them amenable to verification using automated reasoning tools such as those based on δ-decision procedures. We presented several examples demonstrating the value of the new approach, including safety verification tasks for highly nonlinear systems. The examples show that the framework can be used to prove stability and safety for examples that were out of reach for existing tools. The new framework relies on the ability to generate reasonable candidate Lyapunov functions, which are analogous to ranking functions from program analysis. Future work will include improved techniques for efficiently generating the ε-Lyapunov and ε-barrier functions and related theoretical questions.

**Acknowledgement.** Our work is supported by the United States Air Force and DARPA under Contract No. FA8750-18-C-0092, AFOSR No. FA9550-19-1-0041, and the National Science Foundation under NSF CNS No. 1830399. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler**

Heiko Becker1(B), Eva Darulova<sup>1</sup>, Magnus O. Myreen<sup>2</sup>, and Zachary Tatlock<sup>3</sup>

<sup>1</sup> MPI-SWS, Saarland Informatics Campus (SIC), Saarbr¨ucken, Germany {hbecker,eva}@mpi-sws.org <sup>2</sup> Chalmers University of Technology, Gothenburg, Sweden myreen@chalmers.se <sup>3</sup> University of Washington, Seattle, USA ztatlock@cs.washington.edu

**Abstract.** Verified compilers like CompCert and CakeML offer increasingly sophisticated optimizations. However, their deterministic source semantics and strict IEEE 754 compliance prevent the verification of "fast-math" style floating-point optimizations. Developers often selectively use these optimizations in mainstream compilers like GCC and LLVM to improve the performance of computations over noisy inputs or for heuristics by allowing the compiler to perform intuitive but IEEE 754-unsound rewrites.

We designed, formalized, implemented, and verified a compiler for Icing, a new language which supports selectively applying fast-math style optimizations in a verified compiler. Icing's semantics provides the first formalization of fast-math in a verified compiler. We show how the Icing compiler can be connected to the existing verified CakeML compiler and verify the end-to-end translation by a sequence of refinement proofs from Icing to the translated CakeML. We evaluated Icing by incorporating several of GCC's fast-math rewrites. While Icing targets CakeML's source language, the techniques we developed are general and could also be incorporated in lower-level intermediate representations.

**Keywords:** Compiler verification · Floating-point arithmetic · Optimization

#### **1 Introduction**

Verified compilers formally guarantee that compiled machine code behaves according to the specification given by the source program's semantics. This stringent requirement makes verifying "end-to-end" compilers for mainstream languages challenging, especially when proving sophisticated optimizations that developers rely on. Recent verified compilers like CakeML [38] for ML and

Z. Tatlock—This work was supported in part by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA.

CompCert [24] for C have been steadily verifying more of these important optimizations [39–41]. While the gap between verified compilers and mainstream alternatives like GCC and LLVM has been shrinking, so-called "fast-math" floating-point optimizations remain absent in verified compilers.

Fast-math optimizations allow a compiler to perform rewrites that are often intuitive when interpreted as real-valued identities, but which may not preserve strict IEEE 754 floating-point behavior. Developers selectively enable fast-math optimizations when implementing heuristics, computations over noisy inputs, or error-robust applications like neural networks—typically at the granularity of individual source files. The IEEE 754-unsound rewrites used in fast-math optimizations allow compilers to perform strength reductions, reorder code to enable other optimizations, and remove some error checking [1,2]. Together these optimization can provide significant savings and are widely-used in performancecritical applications [12].

Unfortunately, strict IEEE 754 source semantics prevents proving fast-math optimizations correct in verified compilers like CakeML and CompCert. Simple strength-reducing rewrites like fusing the expression *x* ∗ *y* + *z* into a faster and locally-more-accurate fused multiply-add (fma) instruction cannot be included in such verified compilers today. This is because fma avoids an intermediate rounding and thus may not produce exactly the same bit-for-bit result as the unoptimized code. More sophisticated optimizations like vectorization and loop invariant code motion depend on reordering operations to make expressions available, but these cannot be verified since floating-point arithmetic is not associative. Even simple reductions like rewriting *x* − *x* to 0 cannot be verified since the result can actually be NaN ("not a number") if *x* is NaN. Each of these cases represent rewrites that developers would often, in principle, be willing to apply manually to improve performance but which can be more conveniently handled by the compiler. Verified compilers' strict IEEE 754 source semantics similarly hinders composing their guarantees with recent tools designed to *improve accuracy* of a source program [14,16,32], as these tools change program behavior to reduce rounding error. In short, developers today are forced to choose between verified compilers and useful tools based on floating-point rewrites.

The crux of the mismatch between verified compilers and fast-math lies in the source semantics: verified compilers implement strict IEEE 754 semantics while developers are intuitively programming against a looser specification of floatingpoint closer to the reals. Developers currently indicate this perspective by passing compiler flags like --ffast-math for the parts of their code written against this looser semantics, enabling mainstream compilers to aggressively optimize those components. Ideally, verified compilers will eventually support such loosened semantics by providing an "approximate real" data type and let the developer specify error bounds under which the compiler could freely apply any optimization that stays within bounds. A good interface to tools for analyzing finiteprecision computations [11,16] could even allow independently-established formal accuracy guarantees to be composed with compiler correctness.

As an initial step toward this goal, we present a pragmatic and flexible approach to supporting fast-math optimizations in verified compilers. Our approach follows the implicit design of existing mainstream compilers by providing two complementary features. First, our approach provides fine-grained control over which parts of a program the compiler may optimize under extended floating-point semantics. Second, our approach provides flexible extensions to the floating-point semantics specified by a set of high-level rewrites which can be specialized to different parts of a program. The result is a new nondeterministic source semantics which grants the compiler freedom to optimize floating-point code within clearly defined bounds.

Under such extended semantics, we verify a set of common fast-math optimizations with the simulation-based proof techniques already used in verified compilers like CakeML and CompCert, and integrate our approach with the existing compilation pipeline of the CakeML compiler. To enable these proofs, we provide various *local* lemmas that a developer can prove about their rewrites to ensure *global* correctness of the verified fast-math optimizer. Several challenges arise in the design of this decomposition including how to handle "duplicating rewrites" like distributivity that introduce multiple copies of a subexpression and how to connect context-dependent rewrites to other analyses (e.g., from accuracy-verification tools) via rewrite preconditions. Our approach thus provides a rigorous formalization of the intuitive fast-math semantics developers already use, provides an interface for dispatching proof obligations to formal numerical analysis tools via rewrite preconditions, and enables bringing fastmath optimizations to verified compilers.

In summary, the contributions of this paper are:


#### **2 The Icing Language**

In this section we define the Icing language and its semantics to support fastmath style optimizations in a verified compiler. Icing is a prototype language whose semantics is designed to be extensible and widely applicable instead of focusing on a particular implementation of fast-math optimizations. This allows us to provide a stable interface as the implementation of the compiler changes, as well as supporting different optimization choices in the semantics, depending on the compilation target.

#### **2.1 Syntax**

Icing's syntax is shown in Fig. 1. In addition to arithmetic, let-bindings and conditionals, Icing supports fma operators, lists ([*e*<sup>1</sup> *...*]), projections (*e*1[*n*]), and Map and Fold as primitives. Conditional guards consist of boolean constants (*b*), binary comparisons (*e*<sup>1</sup> *e*2), and an isNaN predicate. isNaN *e*<sup>1</sup> checks whether *e*<sup>1</sup> is a so-called *Not-a-Number* (NaN) special value. Under the IEEE 754 standard, undefined operations (e.g., square root of a negative number) produce NaN results, and most operations propagate NaN results when passed a NaN argument. It is thus common to add checks for NaNs at the source or compiler level.

$$\begin{aligned} w &\text{: } 64\text{-bit floating-point word} & \qquad x &\text{: String} & \qquad n \in \mathbb{N} & \qquad b \in \{\text{True}, \text{False}\} \\ &\qquad \circ \in \{ - , \text{sqrt} \} & \qquad \circ \in \{ +, -, \*, /\} & \qquad \square \in \{ <, \leq, = \} \\ &\qquad \circ \square\_{\text{L-}\,\mathsf{g}\_{1}\,\mathsf{e}\_{\mathsf{e}}\,\mathsf{e}\_{\mathsf{e}}} ::= w \mid\_{x} x \mid [\mathsf{e}\_{1} & \quad \mid \mid\_{\leq 1} \mathsf{e}\_{1} \mid \,\_{\leq 1} \mathsf{e}\_{\mathsf{e}} \mid \, \mathsf{f} \mathsf{m} (\mathsf{e}\_{1} \; \mathsf{e}\_{\mathsf{e}} \, \mathsf{e}\_{\mathsf{e}}) \mid \mathsf{g} \mathsf{m} \mathsf{t} : (\mathsf{e}\_{1}) \mid \end{aligned}$$

**Fig. 1.** Syntax of Icing expressions

We use the Map and Fold primitives to show that Icing can be used to express programs beyond arithmetic, while keeping the language simple. Language features like function definitions or general loops do not affect floating-point computations with respect to fast-math optimizations and are thus orthogonal.

The opt: scoping annotation implements one of the key features of Icing: floating-point semantics are relaxed only for expressions under an opt: scope. In this way, opt: provides fine-grained control both for expressions and conditional guards.

#### **2.2 Optimizations as Rewrites**

Fast-math optimizations are typically local and syntactic, i.e., peephole rewrites. In Icing, these optimizations are written as *s* → *t* to denote finding any subexpression matching pattern *s* and rewriting it to *t*, using the substitution from matching *s* to instantiate pattern variables in *t* as usual. The find and replace patterns of a rewrite are terms from the following pattern language which mirrors Icing syntax:

$$\{p\_1, p\_2, p\_3 ::= w \mid b \mid x \mid \Diamond p\_1 \mid p\_1 \circ p\_2 \mid p\_1 \Box p\_2 \mid \mathsf{fna}\left(p\_1, p\_2, p\_3\right) \mid \mathsf{i}\,\mathsf{slM}\, p\_1\}$$

Table 1 shows the set of rewrites currently supported in our development. While this set does not include all of GCC's fast-math optimizations, it does cover the three primary categories:

– performance and precision improving strength reduction which fuses *x* ∗ *y* +*z* into an fma instruction (Rewrite 1)


A key feature of Icing's design is that each rewrite can be guarded by a *rewrite precondition*. We distinguish *compiler rewrite preconditions* as those that must be true for the rewrite to be correct with respect to Icing semantics. Removing a NaN check, for example, can change the runtime behavior of a floating-point program: a previously crashing program may terminate or vice-versa. Thus a NaN-check can only removed if the value can never be a NaN.

In contrast, an *application rewrite precondition* guards a rewrite that can always be proven correct against the Icing semantics, but where a user may still want finer-grained control. By restricting the context where Icing may fire these rewrites, a user can establish end-to-end properties of their application, e.g., worst-case roundoff error. The crucial difference is that the compiler preconditions must be discharged before the rewrite can be proven correct against the Icing semantics, whereas the application precondition is an additional restriction limiting where the rewrite is applied for a specific application.

A key benefit of this design is that *rewrite preconditions can serve as an interface to external tools* to determine where optimizations may be conditionally applied. This feature enables Icing to address limitations that have prevented previous work from proving fast-math optimizations in verified compilers [5] since "The only way to exploit these [floating-point] simplifications while preserving semantics would be to apply them conditionally, based on the results of a static analysis (such as FP interval analysis) that can exclude the problematic cases." [5] In our setting, a static analysis tool can be used to establish an application rewrite precondition, while compiler rewrite preconditions can be discharged during (or potentially after) compilation via static analysis or manual proof.

This design choice essentially decouples the floating-point static analyzer from the general-purpose compiler. One motivation is that the compiler may perform hardware-specific rewrites, which source-code-based static analyzers would generally not be aware of. Furthermore, integrating end-to-end verification of these rewrites into a compiler would require it to always run a global static analysis. For this reason, we propose an interface which communicates only the necessary information.

Rewrites which duplicate matched subexpressions, e.g., distributing multiplication over addition, required careful design in Icing. Such rewrites can lead to unexpected results if different copies of the duplicated expression are optimized differently; this also complicates the Icing correctness proof. We show how preconditions additionally enabled us to address this challenge in Sect. 4.

Icing optimizes code by folding a list of rewrites over a program e:

```
rewrite ([],e) = e
rewrite ((s → t)::rws, e) =
 let e' = if (matches e s) then (app (s → t) e) else e in
 rewrite (rws, e')
```
For rewrite <sup>s</sup>→<sup>t</sup> at the head of rws, rewrite (rws, e) checks if <sup>s</sup> matches <sup>e</sup>, applies the rewrite if so, and recurses. Function rewrite is used in our optimizers in a bottom-up traversal of the AST. Icing users can specify which rewrites may be applied under each distinct opt: scope in their code or use a default set (shown in Table 1).


**Table 1.** Rewrites currently supported in Icing (◦∈{+, ∗})

#### **2.3 Semantics of Icing**

Next, we explain the semantics of Icing, highlighting two distinguishing features. First, values are represented as trees instead of simple floating-point words, thus delaying evaluation of arithmetic expressions. Secondly, rewrites in the semantics are applied nondeterministically, thus relaxing floating-point evaluation enough to prove fast-math optimizations.

We define the semantics of Icing programs in Fig. 2 as a big-step judgment of the form (*cfg,E,e*) → *v*. *cfg* is a configuration carrying a list of rewrites (*s* → *t*) representing allowed optimizations, and a flag tracking whether optimizations are allowed in the current program fragment under an opt: scope (OptOk). *E* is the (runtime) execution environment mapping free variables to values and *e* an Icing expression. The value *v* is the result of evaluating *e* under *E* using optimizations from *cfg*.

The first key idea of Icing's semantics is that expressions are not evaluated to (64-bit) floating-point words immediately; the semantics rather evaluates them into *value trees* representing their computation result. As an example, if *e*<sup>1</sup> evaluates to value tree *v*<sup>1</sup> and *e*<sup>2</sup> to *v*2, the semantics returns the value tree represented as *v*<sup>1</sup> + *v*<sup>2</sup> instead of the result of the floating-point addition of (flattened) *v*<sup>1</sup> and *v*2. The syntax of value trees is:

$$\begin{array}{lcl}c ::= b \mid \mathsf{isNaN} \, v\_1 \mid v\_1 \sqcap v\_2 \mid \mathsf{opt} \colon c\\v\_1, v\_2, v\_3 ::= w \mid \diamond v\_1 \mid v\_1 \diamond v\_2 \mid \mathsf{fm}(v\_1, v\_2, v\_3) \mid \mathsf{opt} \colon v\_1\end{array}$$

let v1 = Map (λ x. opt:(x + 3.0)) vi in let vsum = Fold (λ x y. opt:(x \* x + y)) 0.0 v1 in sqrt vsum

**Fig. 3.** A simple Icing program

Constants are again defined as floating-point words and form the leaves of value trees (variables obtain a constant value from the execution environment *E*). On top of constants, value trees can represent the result of evaluating any floatingpoint operation Icing supports.

The second key idea of our semantics is that it nondeterministically applies rewrites from the configuration *cfg while evaluating* expression *e* instead of just returning its value tree. In the semantics, we model the nondeterministic choice of an optimization result for a particular value tree *v* with the relation rewritesTo, where (*cfg, v*) rewritesTo *r* if either the configuration *cfg* allows for optimizations to be applied, and value tree *v* can be rewritten into value tree *r* using rewrites from the configuration *cfg*; or the configuration does not allow for rewrites to be applied, and *v* = *r*. Rewriting on value trees reuses several definitions from Sect. 2.2. We add the nondeterminism on top of the existing functions by making the relation rewritesTo pick a subset of the rewrites from the configuration *cfg* which are applied to value tree *v*.

Icing's semantics allows optimizations to be applied for arithmetic and comparison operations. The rules Unary, Binary, fma, isNaN, and Compare first evaluate argument expressions into value trees. The final result is then nondeterministically chosen from the rewritesTo relation for the obtained value tree and the current configuration. Evaluation of Map, Fold, and let-bindings follows standard textbook evaluation semantics and does not apply optimizations.

Rule Scope models the fine-grained control over where optimizations are applied in the semantics. We store in the current configuration *cfg* that optimizations are allowed in the (sub-)expression *e* (cfg with OptOk := true).

Evaluation of a conditional (if *c* then *e<sup>T</sup>* else *e<sup>F</sup>* ) first evaluates the conditional guard *c* to a value tree *cv*. Based on value tree *cv* the semantics picks a branch to continue evaluation in. This eager evaluation for conditionals (in contrast to delaying by leaving them in a value tree) is crucial to enable the later simulation proof to connect Icing to CakeML which also eagerly evaluates conditionals. As the value tree *cv* represents a delayed evaluation of a boolean value, we have to turn it into a boolean constant when selecting the branch to continue evaluation in. This is done using the functions cTree2IEEE and tree2IEEE. cTree2IEEE (v) computes the boolean value, and tree2IEEE (v) computes the floating-point word represented by the value tree v by applying IEEE 754 arithmetic operations and structural recursion.

*Example.* We illustrate Icing semantics and how optimizations are applied both in syntax and semantics with the example in Fig. 3. The example first translates the input list by 3*.*0 using a Map, and then computes the norm of the translated list with Fold and sqrt.

We want to apply *<sup>x</sup>* <sup>+</sup> *<sup>y</sup>* <sup>→</sup> *<sup>y</sup>* <sup>+</sup> *<sup>x</sup>* (commutativity of +) and fma-introduction (*<sup>x</sup>* <sup>∗</sup> *<sup>y</sup>* <sup>+</sup> *<sup>z</sup>* <sup>→</sup> fma(*x, y, z*)) to our example program. Depending on their order the function rewrite will produce different results.

If we first apply commutativity of +, and then fma introduction, all + operations in our example will be commuted, but no fma introduced as the fma introduction *syntactically* relies on the expression having the structure *x*∗*y*+*z* where *x, y, z* can be arbitrary. In contrast, if we use the opposite order of rewrites, the second line will be replaced by let vsum = Fold (λx y.fma(x,x,y)) 0.0 v1 and commutativity is only applied in the first line.

To illustrate how the semantics applies optimizations, we run the program on the 2D unit vector (vi = [1.0,1.0]) in a configuration that contains both rewrites. Consequently the Map application can produce [1.0 + 3.0, 1.0 + 3.0], [3.0 + 1.0, 1.0 + 3.0], *...* Where the terms 1.0 + 3.0, 3.0 + 1.0 correspond to the value trees representing the addition of 1.0 and 3.0.

If we apply the Fold operation to this list, there are even more possible optimization results:

[(1.0 + 3.0) \* (1.0 + 3.0) + (1.0 + 3.0) \* (1.0 + 3.0)], [(3.0 + 1.0) \* (3.0 + 1.0) + (3.0 + 1.0) \* (3.0 + 1.0)], [fma ((3.0 + 1.0), (3.0 + 1.0), (3.0 + 1.0) \* (3.0 + 1.0))], [fma ((1.0 + 3.0), (1.0 + 3.0), (3.0 + 1.0) \* (1.0 + 3.0))], ...

The first result is the result of evaluating the initial program without any rewrites, the second result corresponds to syntactically optimizing with commutativity of + and then fma introduction, and the third corresponds to using the opposite order syntactically. The last two results can only be results of semantic optimizations as commutativity and fma introduction are applied to some intermediate results of Map, but not all. There is no syntactic application of commutativity and fma-introduction leading to such results.

#### **3 Modelling Existing Compilers in Icing**

Having defined the syntax and semantics of Icing, we next implement and prove correct functions which model the behavior of previous verified compilers, like CompCert or CakeML, and the behavior of unverified compilers, like GCC or Clang, respectively. For the former, we first define a translator of Icing expressions which preserves the IEEE 754 strict meaning of its input and does not allow for any further optimizations. Then we give a greedy optimizer that unconditionally optimizes expressions, as observed by GCC and Clang.

#### **3.1 An IEEE 754 Preserving Translator**

The Icing semantics nondeterministically applies optimizations if they are added to the configuration. However, when compiling safety-critical code or after applying some syntactic optimizations, one might want to preserve the strict IEEE 754 meaning of an expression.

To make sure that the behavior of an expression cannot be further changed and thus the expression exhibits strict IEEE 754 compliant behavior, we have implemented the function compileIEEE754, which essentially *disallows optimizations* by replacing all optimizable expressions opt: e' with non-optimizable expressions e'. Correctness of compileIEEE754 shows that (a) no optimizations can be applied after the function has been applied, and (b) evaluation is deterministic. We have proven these properties as separate theorems.

#### **3.2 A Greedy Optimizer**

Next, we implement and prove correct an optimizer that mimics the (observed) behavior of GCC and Clang as closely as possible. The optimizer applies fma introduction, associativity and commutativity greedily. All these rewrites only have an application rewrite precondition which we instantiate to True to apply the rewrites unconstrained.

To give an intuition for greedy optimization, recall the example from Fig. 3. Greedy optimization does not consider whether applying an optimization is beneficial or not. If the optimization is allowed to be applied and it matches some subexpression of an optimizable expression, it is applied. Thus the order of optimizations matters. Applying the greedy optimizer with the rewrites [associativity,fma-introduction, commutativity] to the example, we get:

let v1 = Map (λ x. opt:(3.0 + x)) vi in let vsum = Fold (λ x y. opt:(y + x \* x)) 0.0 v1 in sqrt vsum

Only commutativity has been applied as associativity does not match and the possibility for an fma-introduction is ruled out by commutativity. If we reverse the list of optimizations we obtain:

```
let v1 = Map (λ x. opt:(3.0 + x)) vi in
let vsum = Fold (λ x y. opt:(fma (x,x,y))) 0.0 v1 in sqrt vsum
```
which we consider to be a more efficient version of the program from Fig. 3.

Greedy optimization is implemented in the function optimizeGreedy (rws, e) which applies the rewrites in rws in a bottom-up traversal to expression e. In combination with the greedy optimizer our fine-grained control (using opt annotations) allows the end-user to control *where* optimizations can be applied.

We have shown correctness of optimizeGreedy with respect to Icing semantics, i.e., we have shown that optimizing greedily gives the same result as applying the greedy rewrites in the semantics:<sup>1</sup>

#### **Theorem 1.** *optimizeGreedy is correct*

```
Let E be an environment, v a value tree and cfg a configuration.
If (cfg,E,optimizeGreedy ([associativity,commutativity,fma-intro], e)) →
v then (cfg with[associativity, commutativity,fma-intro],E,e) → v.
```
<sup>1</sup> As in many verified compilers, Icing's proofs closely follow the structure of optimizations. Achieving this required careful design and many iterations; we consider the simplicity of Icing's proofs to be a strength of this work.

Proving Theorem 1 without any additional lemmas is tedious as it requires showing correctness of a single optimization in the presence of other optimizations and dealing with the bottom-up traversal applying the optimization at the same time. Thus we reduce the proof of Theorem 1 to proving each rewrite separately and then chaining together these correctness proofs. Lemma 1 shows that applications of the function rewrite can be chained together in the semantics. This also means that adding, removing, or reordering optimizations simply requires changing the list of rewrites, thus making Icing easy to extend.

#### **Lemma 1.** *rewrite is compositional*

*Let e be an expression, v a value tree, s* → *t a rewrite, and rws a set of rewrites. If the rewrite s* → *t can be correctly simulated in the semantics, and list rws can be correctly simulated in the semantics, then the list of rewrites* (*s* → *t*) :: *rws can be correctly simulated in the semantics.*

#### **4 A Conditional Optimizer**

We have implemented an IEEE 754 optimizer which has the same behavior as CompCert and CakeML, and a greedy optimizer with the (observed) behavior of GCC and Clang. The fine-grained control of where optimizations are applied is essential for the usability of the greedy optimizer. However, in this section we explain that the control provided by the opt annotation is often not enough. We show how preconditions can be used to provide additional constraints on where rewrites can be applied, and sketch how preconditions serve as an interface between the compiler and external tools, which can and should discharge them.

We observe that in many cases, whether an optimization is acceptable or not can be captured with a precondition *on the optimization itself*, and not on every arithmetic operation separately. One example for such an optimization is removal of NaN checks as a check for a NaN should only be removed if the check never succeeds.

We argue that both application and compiler rewrite preconditions should be discharged by external tools. Many interesting preconditions for a rewrite depend on a global analysis. Running a global analysis as part of a compiler is infeasible, as maintaining separate analyses for each rewrite is not likely to scale. We thus propose to expose an *interface to external tools* in the form of preconditions.

We implement this idea in the *conditional optimizer* optimizeCond that supports three different applications of fast-math optimizations: applying optimizations rws unconstrained (uncond rws), applying optimizations if precondition P is true (cond P rws), and applying optimizations under the assumptions generation by function A which should be discharged externally (assume A rws). When applying cond, optimizeCond checks whether precondition P is true before optimizing, whereas for assume the propositions returned by A are assumed, and should then be discharged separately by a static analysis or a manual proof.

Correctness of optimizeCond relates syntactic optimizations to applying optimizations in the semantics. Similar to optimizeGreedy, we designed the proof modularly such that it suffices to prove correct each rewrite individually.

Our optimizer optimizeCond takes as arguments first a list of rewrite applications using uncond, cond, and assume then an expression e. If the list is empty, we have optimizeCond ([], e) = e. Otherwise the rewrite is applied in a bottom-up traversal to e and optimization continues recursively. For uncond, the rewrites are applied if they match; for cond P rws the precondition P is checked for the expression being optimized and the rewrites rws are applied if P is true; for assume A rws, the function A is evaluated on the expression being optimized. If execution of A fails, no optimization is applied. Otherwise, A returns a list of assumptions which are logged by the compiler and the rewrites are applied.

Using the interface provided by preconditions, one can prove external theorems showing additional properties of a compiler run using application rewrite preconditions, and external theorems showing how to discharge compiler rewrite preconditions with static analysis tools or a manual proof. We will call such external theorems *meta theorems*.

In the following we discuss two possible meta theorems, highlighting key steps required for implementing (and proving) them. A complete implementation consists of two connections: (1) from the compiler to rewrite preconditions and (2) from rewrite preconditions to external tools. We implement (1) independently of any particular tool. A complete implementation of (2) is out of scope of this paper; meta theorems generally depend on global analyses which are orthogonal to designing Icing, but several external tools already provide functionality that is a close match to our interface and we sketch possible connections below. We note that for these meta theorems, optimizeCond should track the context in which an assumption is made and use the context to express assumptions as *local* program properties. Our current optimizeCond implementation does not collect this contextual information yet, as this information at least partially depends on the particular meta theorems desired.

#### **4.1 A Logging Compiler for NaN Special Value Checks**

We show how a meta theorem can be used to discharge a compiler rewrite precondition on the example of removing a NaN check. Removing a NaN check, in general, can be unsound if the check could have succeeded. Inferring statically whether a value can be a NaN special value or not requires either a global static analysis, or a manual proof on all possible executions.

Preconditions are our interface to external tools. For NaN check removal, we implement a function removeNaNcheck e that returns the assumption that no NaN special value can be the result of evaluating the argument expression *e*. Function removeNaNCheck could then be used as part of an assume rule for optimizeCond. We prove a strengthened correctness theorem for NaN check removal, showing that if the assumption returned by removeNaNcheck is discharged externally (i.e. by the end-user or via static analysis), then we can simulate applying NaN check removal syntactically in Icing semantics *without additional sideconditions*.

The assumption from removeNaNcheck is additionally returned as the result of optimizeCond since it is faithfully assumed when optimizing. Such assumptions can be discharged by static analyzers like Verasco [22], or Gappa [17].

#### **4.2 Proving Roundoff Error Improvement**

Rewrites like associativity and distributivity change the results of floating-point programs. One way of capturing this behavior for a single expression is to compute the roundoff error, i.e. the difference between an idealized real-valued and a floating-point execution of the expression.

To compute an upper bound on the roundoff error, various formally verified tools have been implemented [3,17,30,37]. A possible meta theorem is thus to show that applying a particular list of optimizations does not increase the roundoff error of the optimized expression but only decreases or preserves it. The meta theorem for this example would show that (a) all the applied syntactic rewrites can be simulated in the semantics and (b) the worst-case roundoff error of the optimized expression is smaller or equal to the error of the input expression. Our development already proves (a) and we sketch the steps necessary to show (b) below.

We can leverage these roundoff error analysis tools as application preconditions in a cond rule, checking whether a rewrite should be applied or not in optimizeCond. For a particular expression e, an application precondition (check (s→t, e)) would return true if applying rewrite <sup>s</sup>→<sup>t</sup> does not increase the roundoff error of e.

**Theorem 2.** *check decreases roundoff error (cfg,E, optimizeCond (Cond (*λ*e. check (s*→*t, e))) e)* <sup>→</sup> *<sup>v</sup>* <sup>=</sup><sup>⇒</sup> *(cfg with opts := cfg.opts* ∪ {*s* → *t*}*, E,e)* → *v* ∧ *error e* <sup>≤</sup> *error (optimizeCond (Cond (*λ*e. check (s*→*t, e))) e)*

Implementing check (s→t, e) requires computing a roundoff error for expression <sup>e</sup> and one for <sup>e</sup> rewritten with <sup>s</sup>→<sup>t</sup> and returning True if and only if the roundoff error has not increased by applying the rewrite. Proving the theorem would require giving a real-valued semantics for Icing, connecting Icing's semantics to the semantics of the roundoff error analysis tool, and a global range analysis on the Icing programs, which can be provided by Verasco or Gappa.

#### **4.3 Supporting Distributivity in** optimizeCond

The rewrites considered up to this point do not duplicate any subexpressions in the optimized output. In this section, we consider rewrites which do introduce additional occurrences of subexpressions, which we dub *duplicative rewrites*. Common duplicative rewrites are distributivity of ∗ with + (*x* ∗ (*y* + *x*) ↔ *x* ∗ *y* + *x* ∗ *z*) and rewriting a single multiplication into multiple additions (*x* ∗ *n* ↔ *n <sup>i</sup>*=1 *x*). Here we consider distributivity as an example. A compiler might want to use this optimization to apply further strength reductions or fma introduction.

The main issue with duplicative rewrites is that they add new occurrences of a matched subexpression. Applying (*<sup>x</sup>* <sup>∗</sup> (*<sup>y</sup>* <sup>+</sup> *<sup>z</sup>*) <sup>→</sup> *<sup>x</sup>* <sup>∗</sup> *<sup>y</sup>* <sup>+</sup> *<sup>x</sup>* <sup>∗</sup> *<sup>z</sup>*) to e1 \* (2 + x) returns e1 \* 2 + e1 \* x. The values for the two occurrences of e1 may differ because of further optimizations applied to only one of it's occurrences.

Any correctness proof for such a duplicative rewrite must match up the two (potentially different) executions of e1 in the optimized expression (e1 \* 2 + e1 \* x) with the execution of e1 in the initial expression (e1 \* (2 + x)). This can only be achieved by finding a common intermediate optimization (resp. evaluation) result shared by both subexpressions of e1 \* 2 + e1 \* x.

In general, existence of such an intermediate result can only be proven for expressions that do not depend on "eager" evaluation, i.e. which consists of let-bindings and arithmetic. We illustrate the problem using a conditional (if c then e1 else e2). In Icing semantics, the guard c is first evaluated to a value tree cv. Next, the semantics evaluates cv to a boolean value *b* using function cTree2IEEE. Computing *b* from cv loses the structural information of value tree cv by computing the results of previously delayed arithmetic operations. This loss of information means that rewrites that previously matched the structure of cv may no longer apply to *b*.

This is not a bug in the Icing semantics. On the contrary, our semantics makes this issue explicit, while in other compilers it can lead to unexpected behavior (e.g., in GCC's support for distributivity under fast-math). CakeML, for example, also eagerly evaluates conditionals and similarly loses structural information about optimizations that otherwise may have been applied. Having lazy conditionals in general would only "postpone" the issue until eager evaluation of the conditional expression for a loop is necessary.

An intuitive compiler precondition that enables proving duplicative rewrites is to forbid any control dependencies on the expression being optimized. However, this approach may be unsatisfactory as it disallows branching on the results of optimized expressions and requires a verified dependency analysis that must be rerun or incrementally updated after every rewrite, and thus could become a bottleneck for fast-math optimizers. Instead, in Icing we restrict duplicative rewrites to only fire when pattern variables are matched against program variables, e.g., pattern variables *a, b, c* only match against program variables x, y, z. This restriction to only matching let-bound variables is more scalable, as it can easily be checked syntactically, and allows us to loosen the restriction on controlflow dependence by simply let-binding subexpressions as needed.

#### **5 Connecting to CakeML**

We have shown how to apply optimizations in Icing and how to use it to preserve IEEE 754 semantics. Next, we describe how we connected Icing to an existing verified compiler by implementing a translation from Icing source to CakeML

**Fig. 4.** Simulation diagram for Icing and the designed optimizers

source and showing an equivalence theorem.<sup>2</sup> The translation function toCML maps Icing syntax to CakeML syntax. We highlight the most interesting cases. The translations of Ith, Map, Fold relate an Icing execution to a predefined function from the CakeML standard library. We show separate theorems relating executions of list operations in Icing to CakeML closures of library functions. The predicate isNaN e is implemented as toCML e <> toCML e. The predicate is true in Icing semantics, if and only if e is a NaN special value. Recall that floatingpoint NaN values are incomparable (even to themselves) and thus we implement isNaN with an equality check.

To show that our translation function toCML correctly translates Icing programs into CakeML source, we proved a simulation between the two semantics, illustrated in Fig. 4. The top part consists of the correctness theorems we have shown for the optimizers, relating syntactic optimization to semantic rewriting. In the bottom part we relate a *deterministic* Icing execution which does not apply optimizations to CakeML source semantics and prove an equivalence. For the backward simulation between CakeML and Icing we require the Icing program to be well-typed which is independently checked.

#### **6 Related Work**

*Verified Compilation of Floating-Point Programs.* CompCert [25] uses a constructive formalization of IEEE 754 arithmetic [6] based on Flocq [7] which allows for verified constant propagation and strength reduction optimizations for divisions by powers of 2 and replacing *x*×2 by *x*+*x*. The situation is similar for CakeML [38] whose floating-point semantics is based on HOL's [19,20]. With Icing, we propose a semantics which allows important floating-point rewrites in a verified compiler by allowing users to specify a larger set of possible behaviors for their source programs. The precondition mechanism serves as an interface to external tools. While Icing is implemented in HOL, our techniques are not specific to higher-order logic or the details of CakeML and we believe that an analog of our "verified fast-math" approach could easily be ported to CompCert.

The Alive framework [27] has been extended to verify floating-point peephole optimizations [29,31]. While these tools relax some exceptional (NaN) cases,

<sup>2</sup> We also extended the CakeML source semantics with an fma operation, as CakeML's compilation currently does not support mapping fma's to hardware instructions.

most optimizations still need to preserve "bit-for-bit" IEEE 754 behavior, which precludes valuable rewrites like the fma introductions Icing supports.

*Optimization of Floating-Point Programs.* 'Mixed-precision tuning' can increase performance by decreasing precision at the expense of accuracy, for instance from double to single floating-point precision. Current tools [11,13,16,35], ensure that a user-provided error bound is satisfied either through dynamic or static analysis. In this work, we consider only uniform 64-bit floating-point precision, but Icing's optimizations are equally applicable to other precisions. Optimizations such as mixed-precision tuning are, however, out of scope of a compiler setting, as they require error bound annotations for kernel functions.

Spiral [33] uses real-valued linear algebra identities for rewriting at the algorithmic level to choose a layout which provides the best performance for a particular platform, but due to operation reordering is not IEEE 754 semantics preserving. Herbie [32] optimizes for accuracy, and not for performance by applying rewrites which are mostly based on real-valued identities. The optimizations performed by Spiral and Herbie go beyond what traditional compilers perform, but they fit our view that it is sometimes beneficial to relax the strict IEEE 754 specification, and could be considered in an extended implementation of Icing. On the other hand, STOKE's floating-point superoptimizer [36] for x86 binaries does not preserve real-valued semantics, and only provides approximate correctness using dynamic analysis.

*Analysis and Verification of Floating-Point Programs.* Static analysis for bounding roundoff errors of finite-precision computations w.r.t. to a real-valued semantics [15,17,18,28,30,37] (some with formal certificates in Coq or HOL), are currently limited to short, mostly straight-line functions and require fine-grained domain annotations at the function level. Whole program accuracy can be formally verified w.r.t. to a real-valued implementation with substantial user interaction and expertise [34]. Verification of elementary function implementations has also recently been automated, but requires substantial compute resources [23].

On the other hand, static analyses aiming to verify the absence of runtime exceptions like division by zero [4,10,21,22] scale to realistic programs. We believe that such tools can be used to satisfy preconditions and thus Icing would serve as an interface between the compiler and such specialized verification techniques.

The KLEE symbolic execution engine [9] has support for floating-point programs [26] through an interface to Z3's floating-point theory [8]. This theory is also based on IEEE 754 and will thus not be able to verify the kind of optimizations that Icing supports.

#### **7 Conclusion**

We have proposed a novel semantics for IEEE 754-unsound floating-point compiler optimizations which allows them to be applied in a verified compiler setting and which captures the intuitive semantics developers often use today when reasoning about their floating-point code. Our semantics is nondeterministic in order to provide the compiler the freedom to apply optimizations where they are useful for a particular application and platform—but within clearly defined bounds. The semantics is flexible from the developer's perspective, as it provides fine-grained control over which optimizations are available and where in a program they can be applied. We have presented a formalization in HOL4, implemented three prototype optimizers, and connected them to the CakeML verified compiler frontend. For our most general optimizer, we have explained how it can be used to obtain meta-theorems for its results by exposing a well-defined interface in the form of preconditions. We believe that our semantics can be integrated fully with different verified compilers in the future, and bridge the gap between compiler optimizations and floating-point verification techniques.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Sound Approximation of Programs with Elementary Functions**

Eva Darulova1(B) and Anastasia Volkova<sup>2</sup>

<sup>1</sup> MPI-SWS, Saarland Informatics Campus, Saarbr¨ucken, Germany eva@mpi-sws.org <sup>2</sup> Inria, Lyon, France anastasia.volkova@inria.fr

**Abstract.** Elementary function calls are a common feature in numerical programs. While their implementations in mathematical libraries are highly optimized, function evaluation is nonetheless very expensive compared to plain arithmetic. Full accuracy is, however, not always needed. Unlike arithmetic, where the performance difference between for example single and double precision floating-point arithmetic is relatively small, elementary function calls provide a much richer tradeoff space between accuracy and efficiency. Navigating this space is challenging, as guaranteeing the accuracy and choosing correct parameters for good performance of approximations is highly nontrivial. We present a fully automated approach and a tool which approximates elementary function calls inside small programs while guaranteeing overall user given error bounds. Our tool leverages existing techniques for roundoff error computation and approximation of individual elementary function calls and provides an automated methodology for the exploration of parameter space. Our experiments show that significant efficiency improvements are possible in exchange for reduced, but guaranteed, accuracy.

#### **1 Introduction**

Numerical programs face an inherent tradeoff between accuracy and efficiency. Choosing a larger finite precision provides higher accuracy, but is generally more costly in terms of memory and running time. Not all applications, however, need a very high accuracy to work correctly. We would thus like to compute the results with only as much accuracy as is needed, in order to save resources.

Navigating this tradeoff between accuracy and efficiency is challenging. First, estimating the accuracy, i.e. bounding roundoff and approximation errors, is nontrivial due to the complex nature of finite-precision arithmetic which inevitably occurs in numerical programs. Second, the space of possible implementations is usually prohibitively large and thus cannot be explored manually.

Today, users can choose between different automated tools for analyzing accuracy of floating-point programs [7,8,11,14,18,20,26] as well as for choosing between different precisions [5,6,10]. The latter tools perform mixed-precision tuning, i.e. they assign different floating-point precisions to different operations, and can thus improve the performance w.r.t. a uniform precision implementation. The success of such an optimization is, however, limited to the case when uniform precision is just barely not enough to satisfy a given accuracy specification.

Another possible target for performance optimizations are elementary functions (e.g. sin, exp). Users by default choose single- or double-precision libm library function implementations, which are fully specified in the C language standard (ISO/IEC 9899:2011) and provide high accuracy. Such implementations are, however, expensive. When high accuracy is not needed, we can save significant resources by replacing libm calls by coarser approximations, opening up a larger, and different tradeoff space than mixed-precision tuning. Unfortunately, existing automated approaches [1,25] do not provide accuracy guarantees.

On the other hand, tools like Metalibm [3] approximate *individual* elementary functions by polynomials with rigorous accuracy guarantees given by the user. They, however, do not consider entire programs and leave the selection of its parameters to the user, limiting its usability mostly to experts.

We present an approach and a tool which leverages the existing whole-program error analysis of Daisy [8] and Metalibm's elementary function approximation to provide both *sound whole-program guarantees* as well as *efficient* C implementations for floating-point programs with elementary function calls. Given a target error specification, our tool automatically distributes the error budget among uniform single or double precision arithmetic operations and elementary functions, and selects a suitable polynomial degree for their approximation.

We have implemented our approach inside the tool Daisy and compare the performance of generated programs against programs using libm on examples from literature. The benchmarks spend on average 38% and up to 50% of time for evaluation of the elementary functions. Our tool improves the overall performance by on average 14% and up to 25% when approximating each elementary function call individually, and on average 17% and up to 31% when approximating compound function calls. These improvements were achieved solely by optimizing approximations to elementary functions and illustrate pertinence of our approach. These performance improvements incur overall whole-program errors which are only 2–3 magnitudes larger than double-precision implementations using libm functions and are well below the errors of single-precision implementations. Our tool thus allows to effectively trade performance for larger, but guaranteed, error bounds.

*Contributions.* In summary, in this paper we present: (1) the first approximation technique for elementary functions with sound whole-program error guarantees, (2) an experimental evaluation on benchmarks from literature, and (3) an implementation, which is available at https://github.com/malyzajko/daisy.

**Related Work.** Several static analysis tools bound roundoff errors of floatingpoint computations [7,18,20,26], assuming libm implementations, or verify the correctness of several functions in Intel's libm library [17]. Muller [21] provides a good overview of the approximation of elementary functions. Approaches for improving the performance of numerical programs include mixed-precision tuning [5,6,10,16,24], and autotuning, which performs low-level real-value semanticspreserving transformations [23,27]. These leverage a different part of the tradeoff space than libm approximation and are thus orthogonal. Herbie [22] and Sardana [7] improve accuracy by rewriting the non-associative finite-precision arithmetic, which is complementary to our approach. Approaches which approximate entire numerical programs include MCMC search [25], enumerative program synthesis [1] and neural approximations [13]. Accuracy is only checked on a small set of sample inputs and is thus not guaranteed.

#### **2 Our Approach**

We explain our approach using the following example [28] computing a forward kinematics equation and written in Daisy's real-valued specification language:

Although this program is relatively simple, it still presents an opportunity for performance savings, especially when it is called often, e.g. during the motion of a robotics arm. Assuming double-precision floating-point arithmetic and library implementations for sine, Daisy's static analysis determines the worst-case absolute roundoff error of the result to be 3.44e-15. This is clearly a much smaller error than what the user requested (1e-11) in the postcondition (ensuring clause).

The two elementary function calls to sin account for roughly 40.7% of the overall running time. We can save some of this running time using polynomial approximations, which our tool generates in less than 6 min. The new double precision C implementation is roughly 15.6% faster than one with libm<sup>1</sup> functions, i.e. using around 40% of the available margin. This is a noteworthy performance improvement, considering that we optimized uniquely the evaluation of elementary functions. The actual error of the approximate implementation is 1.56e-12, i.e. roughly three orders of magnitude higher than the libm error. This error is still much smaller than if we had used a uniform single precision implementation, which incurs a total error of 1.85e-6.

We implement our approach inside the Daisy framework [8], combining Daisy's static dataflow analysis for bounding finite-precision roundoff errors, Metalibm's automated generation of efficient polynomial approximations, as well as a novel error distribution algorithm. Our tool furthermore automatically selects a suitable polynomial degree for approximations to elementary functions.

<sup>1</sup> There are various different implementations of libm that depend on the operating system and programming language. Here when referring to libm we mean the GNU libc implementation (https://www.gnu.org/software/libc/).

Unlike previous work, our tool *guarantees* that the user-specified error is satisfied. It soundly distributes the overall error budget among arithmetic operations and libm calls using Daisy's static analysis. Metalibm uses the state-of-the art minimax polynomial approximation algorithm [2] and Sollya [4] and Gappa [12] to bound errors of their implementations. Given a function, a target relative error bound and implementation parameters, Metalibm generates C code. Our tool does not guarantee to find the most efficient implementation; the search space of implementation and approximation choices is highly complex and discrete, and it is thus infeasible to find the optimal parameters.

The input to our tool is a straight-line program<sup>2</sup> with standard arithmetic operators (=*,* −*,* ∗*, /*) as well as the most commonly used elementary functions (sin *,* cos*,*tan *,* log *,* exp *,* √). The user further specifies the domains of all inputs, together with a target overall absolute error which must be satisfied. The output is C code with arithmetic operations in uniform single or double precision, and libm approximations in double precision (Metalibm's only supported precision).

*Algorithm.* We will use 'program' for the entire expression, and 'function' for individual elementary functions. Our approach works in the following steps.

**Step 1** We re-use Daisy's frontend which parses the input specification. We add a pre-processing step, which decomposes the abstract syntax tree (AST) of the program we want to approximate such that each elementary function call is assigned to a fresh local variable. This transformation eases the later replacement of the elementary functions with an approximation.

**Step 2** We use Daisy's roundoff error analysis on the entire program, assuming a libm implementation of elementary functions. This analysis computes a real-valued range and a worst-case absolute roundoff error bound for each subexpression in the AST, assuming uniform single or double precision as appropriate. We use this information in the next step to distribute the error and to determine the parameters for Metalibm for each function call.

**Step 3** This is the core step, which calls Metalibm to generate a (piecewise) polynomial approximation for each elementary function which was assigned to a local variable. Each call to Metalibm specifies the local target error for each function call, the polynomial degree and the domain of the function call arguments. To determine the argument domains, we use the range and error information obtained in the previous step. Our tool tries different polynomial degrees and selects the fastest implementation. We explain our error distribution and polynomial selection further below.

Metalibm generates efficient double-precision C code including argument reduction (if applicable), domain splitting, and polynomial approximation with a guaranteed error below the specified target error (or returns an error). Metalibm furthermore supports approximations with lookup tables, whose size the user can control manually via our tool frontend as well.

<sup>2</sup> All existing approaches for analysing floating-point roundoff errors which handle loops or conditional branches, reduce the reasoning about errors to straight-line code, e.g. through loop invariants [9,14] or loop unrolling [7], or path-wise analysis [7,9,15].

**Step 4** Our tool performs roundoff error analysis again, this time taking into account the new approximations' precise error bounds reported by Metalibm. Finally, Daisy generates C code for the program itself, as well as all necessary headers to link with the approximation generated by Metalibm.

*Error Distribution.* In order to call Metalibm, Daisy needs to determine the target error for each libm call. Recall that the user of our tool only specifies the *total* error at the end of the program. Hence, distributing the total error budget among arithmetic operations and (potentially several) elementary function calls is a crucial step. Consider again our running example which has two elementary function calls. Our tool distributes the error budget as follows:

$$|f(x) - \tilde{f}(\tilde{x})| \le |f(x) - \hat{f}\_1(x)| + |\hat{f}\_1(x) - \hat{f}\_2(x)| + |\hat{f}\_2(x) - \tilde{f}(\tilde{x})|$$

where we denote by *f* the real-valued specification of the program; ˆ*f*<sup>1</sup> and ˆ*f*<sup>2</sup> have one and two elementary function calls approximated, respectively, and arithmetic is considered exact; and ˜*f* is the final finite-precision implementation.

Daisy first determines the budget for the finite-precision roundoff error (<sup>|</sup> <sup>ˆ</sup>*f*2(*x*) <sup>−</sup> ˜*f*(˜*x*)|) and then distributes the remaining part among libm calls. At this point, Daisy cannot compute <sup>|</sup> <sup>ˆ</sup>*f*2(*x*) <sup>−</sup> ˜*f*(˜*x*)<sup>|</sup> exactly, as the approximations are not available yet. Instead, it assumes libm-based approximations as baseline.

Then, Daisy distributes the remaining error budget either equally among the elementary function calls, or by taking into account that the approximation errors are propagated differently through the program. This error propagation is estimated by computing the derivative w.r.t. to each elementary function call (which gives an estimation of the conditional number). Daisy computes partial derivatives symbolically and maximizes them over the specified input domain.

Finally, we obtain an error budget for each libm call, representing the total error due to the elementary function call *at the end of the program*. For calling Metalibm, however, we need the *local* error at the function call site. Due to error propagation, these two errors can differ significantly, and may lead to overall errors which exceed the error bound specified by the user. We estimate the error propagation using a linear approximation based on derivatives, and use this estimate to compute a *local* target error from the total error budget.

Since Metalibm usually generates approximations with slightly tighter error bounds than asked for, our tool performs a second roundoff analysis (step 4), where all errors (smaller or larger) are correctly taken into account.

*Polynomial Degree Selection.* The polynomial degree significantly and in a discrete way influences the efficiency of approximations, so that optimal prediction is infeasible. Hence, our tool performs a linear search, using the (coarse) estimated running time reported by Metalibm (obtained with a few benchmarking runs) to select the approximation with the smallest estimated running time. The search stops either when the estimated running time is significantly higher than the current best, or when Metalibm times out.

We do not automatically exploit other Metalibm's parameters, such as minimum subdomain width for splitting, since they give fine-grained control that is not suitable for *general* automatic implementations.

#### **3 Experimental Evaluation**

We evaluate our approach in terms of accuracy and performance on benchmarks from literature [9,19,28] which include elementary function calls, and extend them with the examples rodriguesRotation<sup>3</sup> and ex2\* and ex3 d, which are problems from a graduate analysis textbook. While they are relatively short, they represent important kernels usually employing several elementary function calls<sup>4</sup>. We base target error bounds on roundoff errors of a libm implementation: middle and large errors, each of which is roughly three and four orders of magnitudes larger than the libm-based bound, respectively. By default, we assume double 64 bit precision.

Our tool provides an automatic generation of benchmarking code for each input program. Each benchmarking executable runs the Daisy-generated code on 10<sup>7</sup> random inputs from the input domain and measures performance in the number of processor clock cycles. Of the measured number of cycles we discard the highest 10%, as we have observed these to be outliers.

*Experimental Results.* By default, we approximate individual elementary function calls separately, use equal error distribution and allow table-based approximations with an 8-bit table index. For large errors we also measure performance for: (i) default settings but with the derivative-based errors distribution; (ii) default settings but without table usage; (iii) default settings but with compound calls with depth 1 and depth ∞ (approximation 'as much as possible').

Table 1 shows the performance improvements of approximated code w.r.t. libm based implementations of our benchmarks. We compare against libm only, as no approximation or synthesis tool provides error guarantees. By removing libm calls in initial programs we roughly estimate the elementary function overhead (second column) and give an idea for the margin of improvement. Figure 1 illustrates the overall improvement that we obtain for each benchmark (the height of the bars) and the relative distribution of the running time between arithmetic (blue) and elementary functions (green), for large errors with default settings but approximate compound calls with depth = ∞.

Our tool generates code with significant performance improvements for most functions and often reduces the time spent for the evaluation of elementary functions by a factor of two. As expected, the improvements are overall better for larger errors and vary on average from 10.7% to 13.8% for individual calls depending on the settings, and reach 17.1% on average when approximating compound calls as much as possible. However, increasing the program target error (for equal error distributions Metalibm target error increases linearly with it) does not necessarily lead to better performance, e.g. in case of axisRotationY and rodriguesRotation. This is the result of discrete decisions concerning the approximation degrees and the domain splittings inside Metalibm.

<sup>3</sup> https://en.wikipedia.org/wiki/Rodrigues27 rotation formula.

<sup>4</sup> Experiments are performed on a Debian Linux 9 Desktop machine with a 3.3 GHz Intel i5 processor and 16 GB of RAM. All code for benchmarking is compiled with GNUs g++, version 6.3.0, with the -02 flag.


**Table 1.** Performance improvements (in percent) of approximated code w.r.t. a program with libm library function calls.

Somewhat surprisingly, we did not observe an advantage of using the derivative-based error distribution over the equal one. We suspect that is due to the nonlinear nature of Metalibm's heuristics.

Table 1 further demonstrates that usage of tables generally improves the performance. However, the influence of increasing the table size must be studied on a case-by-case basis since large tables might lead to memory-bound computations.

We observe that it is generally beneficial to approximate 'as much as possible'. Indeed, the power of Metalibm lies in generating (piece-wise) polynomial approximations of compound expressions, whose behavior might be much simpler to evaluate than its individual subexpressions.

Finally, we also considered an implementation where all data and arithmetic operations are in single precision apart from the double-precision Metalibmgenerated code (whose output is accurate only to single precision). We observe that slight performance improvements are possible, i.e. Metalibm can compete even with single-precision libm-based code, but to achieve performance improvements comparable to those of double-precision code, we need a single-precision code generation from Metalibm.

**Fig. 1.** Average performance and standard deviation. For each benchmark, the first bar shows the running time of the libm-based implementation and the second one of our implementation. Even relatively small overall time improvements are significant w.r.t. the time portion we can optimize (in green). Our implementations also have significantly smaller standard deviation (black bars). (Color figure online)

*Analysis Time.* Analysis time is highly dependent on the number of required approximations of elementary functions: each approximation requires a separate call to Metalibm whose running time in turn depends on the problem definition. Daisy reduces the number of calls to Metalibm by common expression elimination which improves the analysis time. Currently, we set the timeout for each Metalibm call to 3 min, which leads to an overall analysis time which is reasonable. Overall, our tool takes between 15 s and 20 min to approximate whole programs, with the average running time being 4 min 40 s per program.

#### **4 Conclusion**

We presented a fully automated approach which improves the performance of small numerical kernels at the expense of some accuracy by generating custom approximations of elementary functions. Our tool is parametrized by a user-given whole-program absolute error bound which is guaranteed to be satisfied by the generated code. Experiments illustrate that the tool efficiently uses the available margin for improvement and provides significant speedups for double-precision implementations. This work provides a solid foundation for future research in the areas of automatic approximations of single-precision and multivariate functions.

**Acknowledgments.** The authors thank Christoph Lauter for useful discussions and Youcef Merah for the work on an early prototype.

#### **References**

1. Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metasketches. In: POPL (2016)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Verification

## **Formal Verification of Quantum Algorithms Using Quantum Hoare Logic**

Junyi Liu1,2, Bohua Zhan1,2(B) , Shuling Wang1(B) , Shenggang Ying<sup>1</sup>, Tao Liu<sup>1</sup>, Yangjia Li<sup>1</sup>, Mingsheng Ying1,3,4, and Naijun Zhan1,2

<sup>1</sup> State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China

{liujy,bzhan,wangsl,yingsg,liut,yangjia,znj}@ios.ac.cn <sup>2</sup> University of Chinese Academy of Sciences, Beijing, China

<sup>3</sup> University of Technology Sydney, Sydney, Australia <sup>4</sup> Tsinghua University, Beijing, China

**Abstract.** We formalize the theory of quantum Hoare logic (QHL) [TOPLAS 33(6),19], an extension of Hoare logic for reasoning about quantum programs. In particular, we formalize the syntax and semantics of quantum programs in Isabelle/HOL, write down the rules of quantum Hoare logic, and verify the soundness and completeness of the deduction system for partial correctness of quantum programs. As preliminary work, we formalize some necessary mathematical background in linear algebra, and define tensor products of vectors and matrices on quantum variables. As an application, we verify the correctness of Grover's search algorithm. To our best knowledge, this is the first time a Hoare logic for quantum programs is formalized in an interactive theorem prover, and used to verify the correctness of a nontrivial quantum algorithm.

#### **1 Introduction**

Due to the rapid progress of quantum technology in the recent years, it is predicted that practical quantum computers can be built within 10–15 years. Especially during the last 3 years, breakthroughs have been made in quantum hardware. Programmable superconductor quantum computers and trapped ion quantum computers have been built in universities and companies [1,3,4,6,23].

In another direction, intensive research on quantum programming has been conducted in the last decade [16,45,51,53], as surveyed in [27,52]. In particular, several quantum programming languages have been defined and their compilers have been implemented, including Quipper [31], Scaffold [35], QWire [47], Microsoft's LIQUi|- [25] and Q# [57], IBM's OpenQASM [22], Google's Cirq [30], ProjectQ [56], Chisel-Q [40], Quil [55] and <sup>Q</sup> <sup>|</sup>SI- [39]. These research allow quantum programs to first run on an ideal simulator for testing, and then on physical devices [5]. For instance, many small quantum algorithms and protocols have already been programmed and run on IBM's simulators and quantum computers [1,2].

Clearly, simulators can only be used for testing. It shows the correctness of the program on one or a few inputs, not its correctness under all possible inputs. Various theories and tools have been developed to formally reason about quantum programs for all inputs on a fixed number of qubits. Equivalence checking [7,8], termination analysis [38], reachability analysis [64], and invariant generation [62] can be used to verify the correctness or termination of quantum programs. Unfortunately, the size of quantum programs on which these tools are applicable is quite limited. This is because all of these tools still perform calculations over the entire state space, which for quantum algorithms has size exponential in the number of qubits. For instance, even on the best supercomputers today, simulation of a quantum program is restricted to about 50–60 qubits. Most model-checking algorithms, which need to perform calculations on operators over the state space, are restricted to 25–30 qubits with the current computing resources.

Deductive program verification presents a way to solve this state space explosion problem. In deductive verification, we do not attempt to execute the program or explore its state space. Rather, we define the semantics of the program using precise mathematical language, and use mathematical reasoning to prove the correctness of the program. These proofs are checked on a computer (for example, in proof assistants such as Coq [15] or Isabelle [44]) to ensure a very high level of confidence.

To apply deductive reasoning to quantum programs, it is necessary to first define a precise semantics and proof system. There has already been a lot of work along these lines [9,20,21,61]. A recent result in this direction is *quantum Hoare logic* (QHL) [61]. It extends to sequential quantum programs the Floyd-Hoare-Naur inductive assertion method for reasoning about correctness of classical programs. QHL is proved to be (relatively) complete for both partial correctness and total correctness of quantum programs.

In this paper, we formalize the theory of quantum Hoare logic in Isabelle/HOL, and use it to verify a non-trivial quantum algorithm – Grover's search algorithm<sup>1</sup>. In more detail, the contributions of this paper are as follows.


<sup>1</sup> Available online at https://www.isa-afp.org/entries/QHLProver.html.

order. Another significant part of our work is to define the tensor product of vectors and matrices, in a way that can be used to extend and combine operations on quantum variables in a consistent way. Finally, we implement algorithms to automatically prove identities in linear algebra to ease the formalization process.

The organization of the rest of the paper is as follows. Section 2 gives a brief introduction to quantum Hoare logic. Section 3 describes in detail our formalization of QHL in Isabelle/HOL. Section 4 describes the application to Grover's algorithm. Section 5 discusses automation techniques, and gives some idea about the cost of the formalization. Section 6 reviews some related work. Finally, we conclude in Sect. 7 with a discussion of future directions of work.

We expect theorem proving techniques will play a crucial role in formal reasoning about quantum computing, as they did for classical computing, and we hope this paper will be one of the first steps in its development.

#### **2 Quantum Hoare Logic**

In this section, we briefly recall the basic concepts and results of quantum Hoare logic (QHL). We only introduce the proof system for partial correctness, since the one for total correctness is not formalized in our work. In addition, we make two simplifications compared to the original work: we consider only variables with finite dimension, and we remove the initialization operation. The complete version of QHL can be found in [61].

In QHL, the number of quantum variables is pre-set before each run of the program. Each quantum variable q*<sup>i</sup>* has dimension d*i*. The (pure) state of the quantum variable takes value in a complex vector space of dimension d*i*. The overall (pure) state takes value in the tensor product of the vector spaces for the variables, which has dimension d = d*i*. The mixed state for variable q*<sup>i</sup>* (resp. overall) is given by a <sup>d</sup>*<sup>i</sup>* <sup>×</sup> <sup>d</sup>*<sup>i</sup>* (resp. <sup>d</sup> <sup>×</sup> <sup>d</sup>) matrix satisfying certain conditions (making them *partial density operators*). The notation q is used to denote some finite sequence of distinct quantum variables (called a *quantum register* ). We denote the vector space corresponding to <sup>q</sup> by <sup>H</sup>*q*.

The syntax of quantum programs is given by the following grammar:

$$S \colon= \mathbf{skip} \mid \overline{q} := U\overline{q} \mid S\_1; S\_2 \mid \mathbf{measure} \; M[\overline{q}] : \overline{S} \mid \mathbf{while} \; M[\overline{q}] = 1 \; \mathbf{do} \; S$$

where


190 J. Liu et al.

Quantum programs can be regarded as quantum extensions of classical **while** programs. The **skip** statement does nothing, which is the same as in the classical case. The unitary transformation changes the state of q according to U. It is the counterpart to the assignment operation in classical programming languages. The sequential composition is similar to its classical counterpart. The measurement statement is the quantum generalisation of the classical case statement **if** (<sup>m</sup> · <sup>b</sup>*<sup>m</sup>* <sup>→</sup> <sup>S</sup>*m*) **fi**. The loop statement is a quantum generalisation of the classical loop **while** b **do** S.


**Fig. 1.** Proof system *qPD* for partial correctness

Formally, the denotational semantics for quantum programs is defined as a super-operator -<sup>S</sup>(·), assigning to each quantum program <sup>S</sup> a mapping between partial density operators. As usual, the denotational semantics is defined by induction on the structure of the quantum program:


The correctness of a quantum program S is expressed by a quantum extension of the Hoare triple {P}S{Q}, where the precondition <sup>P</sup> and the postcondition Q are matrices satisfying certain conditions for *quantum predicates* [24]. The semantics for partial correctness is defined as follows:

$$\vdash\_{par} \{ P \} S \{ Q \} \text{ iff } \operatorname{tr}(P\rho) \le \operatorname{tr}(Q \llbracket S \rrbracket(\rho)) + \operatorname{tr}(\rho) - \operatorname{tr}(\llbracket S \rrbracket(\rho))$$

for all partial density operators ρ. Here tr is the trace of a matrix. The semantics for total correctness is defined similarly:

$$\vdash\_{tot} \{ P \} S \{ Q \} \text{ iff } \operatorname{tr}(P\rho) \le \operatorname{tr}(Q \lbrack S \rbrack(\rho)).$$

We note that they become the same when the quantum program S is terminating, i.e. tr(-S(ρ)) = tr(ρ) for all partial density operators ρ.

The proof system *qPD* for partial correctness of quantum programs is given in Fig. 1. The soundness and (relative) completeness of *qPD* is proved in [61]:

**Theorem 1.** *The proof system qPD is sound and (relative) complete for partial correctness of quantum programs.*

#### **3 Formalization in Isabelle/HOL**

In this section, we describe the formalization of quantum Hoare logic in Isabelle/HOL. Isabelle/HOL [44] is an interactive theorem prover based on higher-order logic. It provides a flexible language in which one can state and prove theorems in all areas of mathematics and computer science. The proofs are checked by the Isabelle kernel according to the rules of higher-order logic, providing a very high level of confidence in the proofs. A standard application of Isabelle/HOL is the formalization of program semantics and Hoare logic. See [43] for a description of the general technique, applied to a very simple classical programming language.

#### **3.1 Preliminaries in Linear Algebra**

Our work is based on the linear algebra library developed by Thiemann and Yamada in the AFP entry [58]. We also use some results on the construction of tensor products in another AFP entry by Bentkamp [13].

In these libraries, the type *'a vec* of vectors with entries in type *'a* is defined as pairs (n, f), where n is a natural number, and f is a function from natural numbers to *'a*, such that <sup>f</sup>(i) is undefined when <sup>i</sup> <sup>≥</sup> <sup>n</sup>. Likewise, the type *'a mat* of matrices is defined as triples (nr, nc, f), where nr and nc are natural numbers, and f is a function from pairs of natural numbers to *'a*, such that f(i, j) is undefined when <sup>i</sup> <sup>≥</sup> nr or <sup>j</sup> <sup>≥</sup> nc. The terms *carrier vec n* (resp. *carrier mat m <sup>n</sup>*) represent the set of vectors of length <sup>n</sup> (resp. matrices of dimension <sup>m</sup> <sup>×</sup> <sup>n</sup>). In our work, we focus almost exclusively on the case where *'a* is the complex numbers. For this case, existing libraries already define concepts such as the adjoint of a matrix, and the (complex) inner product between two vectors. We further define concepts such as Hermitian and unitary matrices, and prove their basic properties.

A key result in linear algebra that is necessary for our work is the Schur decomposition theorem. It states that any complex <sup>n</sup>×<sup>n</sup> matrix <sup>A</sup> can be written in the form QUQ−<sup>1</sup>, where Q is unitary and U is upper triangular. In particular, if A is normal (that is, if AA† = A†A), then A is diagonalizable. A version of the Schur decomposition theorem is formalized in [58], showing that any matrix is similar to an upper-triangular matrix U. However, it does not show that Q can be made unitary. We complete the proof of the full theorem, following the outline of the previous proof.

Next, we define the key concept of positive semi-definite matrices (called positive matrices from now on for simplicity). An <sup>n</sup> <sup>×</sup> <sup>n</sup> matrix <sup>A</sup> is positive if <sup>v</sup>†Av <sup>≥</sup> 0 for any vector <sup>v</sup>. We formalize the basic theory of positive matrices, in particular showing that any positive matrix is Hermitian.

Density operators and partial density operators are then defined as follows:

**definition** *density operator A* ←→ *positive A* ∧ *trace A =* 1 **definition** *partial density operator A* ←→ *positive A* ∧ *trace A* ≤ 1

Next, the L¨owner partial order is defined as a partial order on the type *complex mat* as follows:

**definition** *lowner le (* **infix** ≤*<sup>L</sup> 65)* **where**

*A* ≤*<sup>L</sup> B* ←→ *dim row A = dim row B* ∧ *dim col A = dim col B* ∧ *positive (B* − *A)*

A key result that we formalize states that under the L¨owner partial order, any non-decreasing sequence of partial density operators has a least upper bound, which is the pointwise limit of the operators when written as <sup>n</sup> <sup>×</sup> <sup>n</sup> matrices. This is used to define the infinite sum of matrices, necessary for the semantics of the while loop.

#### **3.2 Syntax and Semantics of Quantum Programs**

We now begin with the definition of syntax and semantics of quantum programs. First, we describe how to model states of a quantum program. Recall that each quantum program operates on a fixed set of quantum variables q*i*, where each q*<sup>i</sup>* has dimension d*i*. These information can be recorded in a locale [33] as follows:

```
locale state sig =
 fixes dims :: nat list
```
The total dimension d is given by (here *prod list* denotes the product of a list of natural numbers).

**definition** *d = prod list dims*

The (mixed) state of the system is given by a partial density operator with dimension <sup>d</sup> <sup>×</sup> <sup>d</sup>. Hence, we declare

```
type synonym state = complex mat
```
**definition** *density states :: state set* **where** *density states =* {ρ ∈ *carrier mat d d. partial density operator* ρ}

Next, we define the concept of quantum programs. They are declared as an inductively-defined datatype in Isabelle/HOL, following the grammar given in Sect. 2.

```
datatype com =
  SKIP
```
At this stage, we assume that all matrices involved operate on the global state (that is, all of the quantum variables). We will define commands that operate on a subset of quantum variables later. Measurement is defined over any finite number of matrices. Here *Measure n f C* is a measurement with n options, *f i* for i<n are the measurement matrices, and *C* ! *i* is the command to be executed when the measurement yields result i. Likewise, the first argument to *While* gives measurement matrices, where only the first two values are used.

Next, we define well-formedness and denotation of quantum programs. The predicate *well com :: com* ⇒ *bool* expresses the well-formedness condition. For a quantum program to be well-formed, all matrices involved should have the right dimension, the argument to *Utrans* should be unitary, and the measurements for *Measure* and *While* should satisfy the condition *<sup>i</sup>* <sup>M</sup>† *<sup>i</sup>* <sup>M</sup>*<sup>i</sup>* <sup>=</sup> <sup>I</sup>*n*. Denotation is written as *denote :: com* ⇒ *state* ⇒ *state*, defined as in Sect. 2. Both *well com* and *denote* is defined by induction over the structure of the program. The details are omitted here.

#### **3.3 Hoare Triples**

In this section, we define the concept of Hoare triples, and state what needs to be proved for soundness and completeness of the deduction system. First, the concept of quantum predicates is defined as follows:

**definition** *is quantum predicate P* ←→ *P* ∈ *carrier mat d d* ∧ *positive P* ∧ *P* ≤*<sup>L</sup>* 1<sup>m</sup> *d*

With this, we can give the semantic definition of Hoare triples for partial and total correctness. These definitions are intended for the case where P and Q are quantum predicates, and S is a well-formed program. They define what Hoare triples are *valid*.

**definition** *hoare total correct (*|=*<sup>t</sup>* {*(1 )*}*/ ( )/* {*(1 )*} *50)* **where** |=*<sup>t</sup>* {*P*} *S* {*Q*} ←→ *(*∀ρ∈*density states. trace (P \** ρ*)* ≤ *trace (Q \* denote S* ρ*))*

**definition** *hoare partial correct (*|=*<sup>p</sup>* {*(1 )*}*/ ( )/* {*(1 )*} *50)* **where** |=*<sup>p</sup>* {*P*} *S* {*Q*} ←→ *(*∀ρ∈*density states. trace (P \** ρ*)* ≤ *trace (Q \* denote S* ρ*) + (trace* ρ − *trace (denote S* ρ*)))*

Next, we define what Hoare triples are *provable* in the *qPD* system. A Hoare triple for partial correctness is provable (written as *<sup>p</sup>* {*P*} *S* {*Q*}) if it can be derived by combining the rules in Fig. 1. This condition can be defined in Isabelle/HOL as an inductive predicate. The definition largely parallels the formulae shown in the figure.

With these definitions, we can state and prove soundness and completeness of the Hoare rules for partial correctness. Note that the statement for completeness is very simple, seemingly without needing to state "relative to the theory of the field of complex numbers". This is because we are taking a shallow embedding for predicates, hence any valid statement on complex numbers, in particular positivity of matrices, is in principle available for use in the deduction system (for example, in the assumption to the **order** rule).

**theorem** *hoare partial sound: <sup>p</sup>* {*P*} *S* {*Q*} =⇒ *well com S* =⇒ |=*<sup>p</sup>* {*P*} *S* {*Q*}

**theorem** *hoare partial complete:*


The soundness of the Hoare rules is proved by induction on the predicate *p*, showing that each rule is sound with respect to |=*p*. Completeness is proved using the concept of weakest-preconditions, following [61].

#### **3.4 Partial States and Tensor Products**

So far in our development, all quantum operations act on the entire global state. However, for the actual applications, we are more interested in operations that act on only a few of the quantum variables. For this, we need to define an *extension* operator, that takes a matrix on the quantum state for a subset of the variables, and extend it to a matrix on all of the variables. More generally, we need to define tensor products on vectors and matrices defined over disjoint sets of variables. These need to satisfy various consistency properties, in particular commutativity and associativity of the tensor product. Note that directly using the Kronecker product is not enough, as the matrix to be extended may act on any (possibly non-adjacent) subset of variables, and we need to distinguish between all possible cases.

Before presenting the definition, we first review some preliminaries. We make use of existing work in [13], in particular their encode and decode operations, and emulate their definitions of *matricize* and *dematricize* (used in [13] to convert between tensors represented as a list and matrices). Given a list of dimensions d*i*, the encode and decode operations (named *digit encode* and *digit decode*) produce a correspondence between lists of indices a*<sup>i</sup>* satisfying a*<sup>i</sup>* < d*<sup>i</sup>* for each i<n, and a natural number less than - *<sup>i</sup>* <sup>d</sup>*i*. This works in a way similar to finding the binary representation of a number (in which case all "dimensions" are 2). List operation *nths xs S* constructs the subsequence of *xs* containing only the elements at indices in the set *S*.

The locale *partial state* extends *state sig*, adding *vars* for a subset of quantum variables. Our goal is to define the tensor product of two vectors or matrices over *vars* and its complement −*vars*, respectively.

**locale** *partial state = state sig* + **fixes** *vars :: nat set*

First, *dims1* and *dims2* are dimensions of variables *vars* and *-vars*:

**definition** *dims1 = nths dims vars* **definition** *dims2 = nths dims (*−*vars)*

The operation *encode1* (resp. *encode2* ) provides the map from the product of *dims* to the product of *dims1* (resp. *dims2* ).

**definition** *encode1 i = digit decode dims1 (nths (digit encode dims i) vars)* **definition** *encode2 i = digit decode dims2 (nths (digit encode dims i) (*−*vars))*

With this, tensor products on vectors and matrices are defined as follows (here *d* is the product of *dims*).

**definition** *tensor vec :: 'a vec* ⇒ *'a vec* ⇒ *'a vec* **where** *tensor vec v1 v2 = Matrix.vec d (*λ*i. v1 \$ encode1 i \* v2 \$ encode2 i)*

**definition** *tensor mat :: 'a mat* ⇒ *'a mat* ⇒ *'a mat* **where** *tensor mat m1 m2 = Matrix.mat d d (*λ*(i,j). m1 \$\$ (encode1 i, encode1 j) \* m2 \$\$ (encode2 i, encode2 j))*

We prove the basic properties of *tensor vec* and *tensor mat*, including that they behave correctly with respect to identity, multiplication, adjoint, and trace.

Extension of matrices is a special case of the tensor product, where the matrix on −*vars* is the identity (here *d2* is the product of *dim2* ).

**definition** *mat extension :: 'a mat* ⇒ *'a mat* **where** *mat extension m = tensor mat m (*1<sup>m</sup> *d2)*

With *mat extension*, we can define "partial" versions of quantum program commands *Utrans*, *Measure* and *While*. They take a set of variables q as an extra parameter, and all matrices involved act on the vector space associated to q. These commands are named *Utrans P*, *Measure P* and *While P*. They are usually used in place of the global commands in actual applications.

More generally, we can define the tensor product of vectors and matrices on any two subsets of quantum variables. For this, we define another locale:

**locale** *partial state2 = state sig* + **fixes** *vars1 :: nat set* **and** *vars2 :: nat set* **assumes** *disjoint: vars1* ∩ *vars2 =* {}

To make use of *tensor mat* to define tensor product in this more general setting, we need to find the relative position of variables *vars1* within *vars1* ∪ *vars2* . This is done using *ind in set*, which counts the position of *x* within *A*.

**definition** *ind in set A x = card* {*i. i* ∈ *A* ∧ *i* < *x*} **definition** *vars1' = (ind in set (vars1* ∪ *vars2)) ' vars1*

Finally, the more general tensor products are defined as follows (note since we are now outside the *partial state* locale, we must use qualified names for *tensor vec* and *tensor mat*, and supply extra arguments for variables in the locale. Here *dims0 = nths dims (vars1* ∪ *vars2)* is the total list of dimensions).

**definition** *ptensor vec :: 'a vec* ⇒ *'a vec* ⇒ *'a vec* **where** *ptensor vec v1 v2 = partial state.tensor vec dims0 vars1' v1 v2*

**definition** *ptensor mat :: 'a mat* ⇒ *'a mat* ⇒ *'a mat* **where** *ptensor mat m1 m2 = partial state.tensor mat dims0 vars1' m1 m2*

The partial extension *pmat extension* is defined in a similar way as before.

**definition** *pmat extension :: 'a mat* ⇒ *'a mat* **where** *pmat extension m = ptensor mat m (*1<sup>m</sup> *d2)*

The definitions *ptensor vec* and *ptensor mat* satisfy several key consistency properties. In particular, they satisfy associativity of tensor product. For matrices, this is expressed as follows:

**theorem** *ptensor mat assoc:*

*v1* ∩ *v2 =* {} =⇒ *(v1* ∪ *v2)* ∩ *v3 =* {} =⇒ *v1* ∪ *v2* ∪ *v3* ⊆ {*0..*<*length dims*} =⇒ *ptensor mat dims (v1* ∪ *v2) v3 (ptensor mat dims v1 v2 m1 m2) m3 = ptensor mat dims v1 (v2* ∪ *v3) m1 (ptensor mat dims v2 v3 m2 m3)*

Together, these constructions and consistency properties provide a framework in which one can reason about arbitrary tensor product of vectors and matrices, defined on mutually disjoint sets of quantum variables.

#### **3.5 Case Study: Products of Hadamard Matrices**

In this section, we illustrate the above framework for tensor product of matrices with an application, to be used in the verification of Grover's algorithm in the next section.

In many quantum algorithms, we need to deal with the tensor product of an arbitrary number of Hadamard matrices. The Hadamard matrix (denoted *hadamard* in Isabelle) is given by:

$$H = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1\\ 1 & -1 \end{bmatrix}$$

For example, in Grover's algorithm, we need to apply the Hadamard transform on each of the first n quantum variables, given by *vars1* . A single Hadamard transform on the i'th quantum variable, extended to a matrix acting on the first n quantum variables, is defined as follows:

**definition** *hadamard on i :: nat* ⇒ *complex mat* **where** *hadamard on i i = pmat extension dims* {*i*} *(vars1* − {*i*}*) hadamard*

The effect of consecutively applying the Hadamard transform on each of the first n quantum variables is equivalent to multiplying the quantum state by *exH k (n* − *1)*, where *exH k* is defined as follows.

**fun** *exH k :: nat* ⇒ *complex mat* **where** *exH k 0 = hadamard on i 0* | *exH k (Suc k) = exH k k \* hadamard on i (Suc k)*

Crucially, this matrix product of extensions of Hadamard matrices must equal the tensor product of Hadamard matrices. That is, with *H k* defined as

**fun** *H k :: nat* ⇒ *complex mat* **where** *H k 0 = hadamard* | *H k (Suc k) = ptensor mat dims* {*0..*<*Suc k*} {*Suc k*} *(H k k) hadamard*

we have the theorem

**lemma** *exH eq H: exH k (n* − *1) = H k (n* − *1)*

The proof of this result is by induction, requiring the use of associativity of tensor product stated above.

#### **4 Verification of Grover's Algorithm**

In this section, we describe our application of the above framework to the verification of Grover's quantum search algorithm [32]. Quantum search algorithms [18,32] concern searching an unordered database for an item satisfying some given property. This property is usually specified by an oracle. In a database of N items, where M items satisfy the property, finding an item with the property requires on average O(N/M) calls to the oracle for classical computers. Grover's algorithm reduces this complexity to O( N/M).

The basic idea of Grover's algorithm is rotation. The algorithm starts from an initial state/vector. At every step, it rotates towards the target state/vector for a small angle. As summarised in [18,19,42], it can be mathematically described by the following equation [42, Eq. (6.12)]:

$$G^k \left| \psi\_0 \right\rangle = \cos(\frac{2k+1}{2}\theta)\left| \alpha \right\rangle + \sin(\frac{2k+1}{2}\theta)\left| \beta \right\rangle,$$

where <sup>G</sup> represents the operator at each step, <sup>|</sup>ψ0 is the initial state, θ = 2 arccos (<sup>N</sup> <sup>−</sup> <sup>M</sup>)/N, <sup>|</sup>α is the bad state (for items not satisfying the property), and <sup>|</sup>β is the good state (for items satisfying the property). Thus when θ is very small, i.e., <sup>M</sup> <sup>N</sup>, it costs <sup>O</sup>( N/M) rounds to reach a target state.

Originally, Grover's algorithm only resolves the case M =1[32]. It is immediately generalized to the case of known M with the same idea and the case of unknown M with some modifications [18]. After that, the idea is generalized to all invertible quantum processes [19].

The paper [61] uses Grover's algorithm as the main example illustrating quantum Hoare logic. We largely follow its approach in this paper. See also [42, Chapter 6] for a general introduction.

First, we setup a locale for the inputs to the search problem.

**locale** *grover state =* **fixes** *n :: nat* **and** *f :: nat* ⇒ *bool* **assumes** *n: n* > *1* **and** *dimM: card* {*i. i* < *(2::nat) ˆ n* ∧ *f i*} > *0 card* {*i. i* < *(2::nat) ˆ n* ∧ *f i*} < *(2::nat) ˆ n*

Here n is the number of qubits used to represent the items. That is, we assume N = 2*<sup>n</sup>* items in total. The oracle is represented by the function f, where only its values on inputs less than 2*<sup>n</sup>* are used. The number of items satisfying the property is given by *M = card* {*i. i* <sup>&</sup>lt; *<sup>N</sup>* <sup>∧</sup> *f i*}.

Next, we setup a locale for Grover's algorithm.

```
locale grover state sig = grover state + state sig +
  fixes R :: nat and K :: nat
  assumes dims def: dims = replicate n 2 @ [K]
  assumes R: R = π / (2 * θ) − 1/2
  assumes K: K > R
```
As in [61], we assume <sup>R</sup> <sup>=</sup> π/2<sup>θ</sup> <sup>−</sup> <sup>1</sup>/2 is an integer. This implies that the quantum algorithm succeeds with probability 1. This condition holds, for example, for all N,M where N = 4M. Since we did not formalize quantum states with infinite dimension, we replace the loop counter, which is infinite dimensional in [61], with a variable of dimension K>R. We also remove the control variable for the oracle used in [61]. Overall, our quantum state consists of n variables of dimension 2 for representing the items, and one variable of dimension K for the loop counter.

We now present the quantum program to be verified. First, the operation that performs the Hadamard transform on each of the first n variables is defined by induction as follows.

**fun** *hadamard n :: nat* ⇒ *com* **where** *hadamard n 0 = SKIP* | *hadamard n (Suc i) = hadamard n i ;; Utrans (tensor P (hadamard on i i) (*1<sup>m</sup> *K))*

Here *tensor P* denotes the tensor product of a matrix on the first n variables (of dimension 2*<sup>n</sup>* <sup>×</sup> <sup>2</sup>*<sup>n</sup>*) and a matrix on the loop variable (of dimension <sup>K</sup> <sup>×</sup> <sup>K</sup>). Executing this program is equivalent to multiplying the quantum state corresponding to the first n variables by H⊗*<sup>n</sup>*, as shown in Sect. 3.5.

The body of the loop is given by:

**definition** *D :: com* **where** *D = Utrans P vars1 mat O ;;* *hadamard n n ;; Utrans P vars1 mat Ph ;; hadamard n n ;; Utrans P vars2 (mat incr n)*

where each of the three matrices *mat O*, *mat Ph* and *mat incr* can be defined directly.

**definition** *mat O :: complex mat* **where** *mat O = mat N N (*λ*(i,j). if i = j then (if f i then 1 else* −1*) else 0)* **definition** *mat Ph :: complex mat* **where** *mat Ph = mat N N (*λ*(i,j). if i = j then if i = 0 then 1 else* −1 *else 0)* **definition** *mat incr :: nat* ⇒ *complex mat* **where** *mat incr n = mat n n (*λ*(i,j). if i = 0 then (if j = n* − *1 then 1 else 0) else (if i = j + 1 then 1 else 0))*

Finally, the Grover's algorithm is as follows. Since we do not have initialization, we skip initialization to zero at the beginning and instead assume that the state begins in the zero state in the precondition.

**definition** *Grover :: com* **where** *Grover = hadamard n n ;; While P vars2 M0 M1 D ;;*

where the measurements for the while loop and at the end of the algorithm are:

**definition** *M0 = mat K K (*λ*(i,j). if i = j* ∧ *i* ≥ *R then 1 else 0)* **definition** *M1 = mat K K (*λ*(i,j). if i = j* ∧ *i* < *R then 1 else 0)* **definition** *testN k = mat N N (*λ*(i,j). if i = k* ∧ *j = k then 1 else 0)*

*Measure P vars1 N testN (replicate N SKIP)*

We can now state the final correctness result. Let *proj v* be the outer product vv†, and *proj k k* be <sup>|</sup>k <sup>k</sup>|, where <sup>|</sup>k is the k'th basis vector on the vector space corresponding to the loop variable. Let *pre* and *post* be given as follows:

**definition** *pre = proj (vec N (*λ*k. if k = 0 then 1 else 0))* **definition** *post = mat N N (*λ*(i, j). if i = j* ∧ *f i then 1 else 0)*

Then, the (partial) correctness of Grover's algorithm is specified by the following Hoare triple.

**theorem** *grover partial correct:* |=*<sup>p</sup>* {*tensor P pre (proj k 0)*} *Grover* {*tensor P post (*1<sup>m</sup> *K)*}

We now briefly outline the proof strategy. Following the definition of *Grover* , the proof of the above Hoare triple is divided into three main parts, for the initialization by Hadamard matrices, for the while loop, and for the measurement at the end.

In each part, assertions are first inserted around commands according to the Hoare rules to form smaller Hoare triples. In particular, the precondition of the while loop part is exactly the invariant of the loop. Moreover, it has to be shown that these assertions satisfy the conditions for being quantum predicates, which involve computing their dimension, showing positiveness, and being bounded by the identity matrix under the L¨owner order. Then, these Hoare triples are derived using our deduction system. Before combining them together, we have to show that the postcondition of each command is equal to the precondition of the later one. After that, the three main Hoare triples can be obtained by combining these smaller ones.

After the derivation of the three Hoare triples above, we prove the L¨owner order between the postcondition of each triple and the precondition of the following triple. Afterwards, the triples can be combined into the Hoare triple below:

**theorem** *grover partial deduct: <sup>p</sup>* {*tensor P pre (proj k 0)*} *Grover* {*tensor P post (*1<sup>m</sup> *K)*}

Finally, the (partial) correctness of Grover's algorithm follows from the soundness of our deduction system.

#### **5 Discussion**

Compared to classical programs, reasoning about quantum programs is more difficult in every respect. Instead of discrete mathematics in the classical case, even the simplest reasoning about quantum programs involves complex numbers, unitary and positivity properties of matrices, and the tensor product. Hence, it is to be expected that formal verification of quantum Hoare logic and quantum algorithms will take much more effort. In this section, we describe some of the automation that we built to simplify the manual proof, and give some statistics concerning the amount of effort involved in the formalization.

#### **5.1 Automatic Proof of Identities in Linear Algebra**

During the formalization process, we make extensive use of ring properties of matrices. These include commutativity and associativity of addition, associativity of multiplication, and distributivity. Compared to the usual case of numbers, applying these rules for matrices is more difficult in Isabelle/HOL, since they involve extra conditions on dimensions of matrices. For example, the rule for commutativity of addition of matrices is stated as:

**lemma** *comm add mat:*

*A* ∈ *carrier mat nr nc* =⇒ *B* ∈ *carrier mat nr nc* =⇒ *A+B=B+A*

These extra conditions make the rules difficult to apply for standard Isabelle automation. For our work, we implemented our own tactic handling these rules. In addition to the ring properties, we also frequently need to use the cyclic property of trace (e.g. tr(ABC) = tr(BCA)), as well as the properties of adjoint ((AB)† = B†A† and A†† = A). For simplicity, we restrict to identities involving only <sup>n</sup> <sup>×</sup> <sup>n</sup> matrices, where <sup>n</sup> is a parameter given to the tactic.

The tactic is designed to prove equality between two expressions. It works by computing the normal form of the expressions – using ring identities and identities for the adjoint to fully expand the expression into polynomial form. To handle the trace, the expression tr(A<sup>1</sup> ··· <sup>A</sup>*n*) is normalized to put the <sup>A</sup>*<sup>i</sup>* that is the largest according to Isabelle's internal term order last. All dimension assumptions are collected and reduced (for example, the assumption *A\*B* ∈ *carrier mat n n* is reduced to *A* ∈ *carrier mat n n* and *B* ∈ *carrier mat n n*).

Overall, the resulting tactic is used 80 times in our proofs. Below, we list some of the more complicated equations resolved by the tactic. The tactic reduces the goal to dimensional constraints on the atomic matrices (e.g. *M* ∈ *carrier mat n n* and *P* ∈ *carrier mat n n* in the first case).

$$\begin{aligned} \text{tr}(MM^\dagger(PP^\dagger)) &= \text{tr}((P^\dagger M)(P^\dagger M)^\dagger) \\ \text{tr}(M\_0AM\_0^\dagger) + \text{tr}(M\_1AM\_1^\dagger) &= \text{tr}((M\_0^\dagger M\_0 + M\_1^\dagger M\_1)A) \\ H^\dagger(Ph^\dagger(H^\dagger Q\_2H)Ph)H &= (HPhH)^\dagger Q\_2(HPhH) \end{aligned}$$

#### **5.2 Statistics**

Overall, the formalization consists of about 11,500 lines of Isabelle theories. An old version of the proof is developed on and off for two years. The current version is re-developed, using some ideas from the old version. The development of the new version took about 5 person months. Detailed breakdown of number of lines for different parts of the proof is given in the following table.


In particular, with the verification framework in place, the proof of correctness for Grover's search algorithm takes just over 3000 lines. While this shows that it is realistic to use the current framework to verify more complicated algorithms such as Shor's algorithm, it is clear that more automation is needed to enable verification on a larger scale.

#### **6 Related Work**

The closest work to our research is Robert Rand's implementation of Qwire in Coq [49,50]. Qwire [47] is a language for describing *quantum circuits*. In this model, quantum algorithms are implemented by connecting together quantum gates, each with a fixed number of bit/qubit inputs and outputs. How the gates are connected is determined by a classical host language, allowing classical control of quantum computation. The work [49] defines the semantics of Qwire in Coq, and uses it to verify quantum teleportation, Deutsch's algorithm, and an example on multiple coin flips to illustrate applicability to a family of circuits. In this framework, program verification proceeds directly from the semantics, without defining a Hoare logic. As in our work, it is necessary to solve the problem of how to define extensions of an operation on a few qubits to the global state. The approach taken in [49] is to use the usual Kronecker product, augmented either by the use of swaps between qubits, or by inserting identity matrices at strategic positions in the Kronecker product.

There are two main differences between [49] and our work. First, quantum algorithms are expressed using quantum circuits in [49], while we use quantum programs with while loops. Models based on quantum circuits have the advantage of being concrete, and indeed most of the earlier quantum algorithms can be expressed directly in terms of circuits. However, several new quantum algorithms can be more properly expressed by while loops, e.g. quantum walks with absorbing boundaries, quantum Bernoulli factory (for random number generation), HHL for systems of linear equations and qPCA (Principal Component Analysis). Second, we formalized a Hoare logic while [49] uses denotational semantics directly. As in verification of classical programs, Hoare logic encapsulates standard forms of argument for dealing with each program construct. Moreover, the rules for QHL is in weakest-precondition form, allowing the possibility of automated verification condition generation after specifying the loop invariants (although this is not used in the present paper).

Besides Rand's work, quite a few verification tools have been developed for quantum communication protocols. For example, Nagarajan and Gay [41] modeled the BB84 protocol [12] and verified its correctness. Ardeshir-Larijani et al. [7,8] presented a tool for verification of quantum protocols through equivalence checking. Existing tools, such as PRISM [37] and Coq, are employed to develop verification tools for quantum protocols [17,29]. Furthermore, an automatic tool called Quantum Model-Checker (QMC) is developed [28,46].

Recently, several specific techniques have been proposed to algorithmically check properties of quantum programs. In [63], the Sharir-Pnueli-Hart method for verifying probabilistic programs [54] has been generalised to quantum programs by exploiting the Schr¨odinger-Heisenberg duality between quantum states and observables. Termination analysis of nondeterministic and concurrent quantum programs [38] was carried out based on reachability analysis [64]. Invariants can be generated at some steps in quantum programs for debugging and verification of correctness [62]. But up to now no tools are available that implements these techniques. Another Hoare-style logic for quantum programs was proposed in [36], but without (relative) completeness.

Interactive theorem proving has made significant progress in the formal verification of classical programs and systems. Here, we focus on listing some tools designed for special kinds of systems. EasyCrypt [10,11] is an interactive framework for verifying the security of cryptographic constructs in the computational model. It is developed based on a probabilistic relational Hoare logic to support machine-checked construction and verification of game-based proofs. Recently, verification of hybrid systems via interactive theorem proving has also been studied. KeYmaera X [26] is a theorem prover implementing differential dynamic logic (dL) [48], for the verification of hybrid programs. In [60], a prover has been implemented in Isabelle/HOL for reasoning about hybrid processes described using hybrid CSP [34].

Our work is based on existing formalization of matrices and tensors in Isabelle/HOL. In [59] (with corresponding AFP entry [58]), Thiemann et al. developed the matrix library that we use here. In [14] (with corresponding AFP entry [13]), Bentkamp et al. developed tensor analysis based on the above work, in an effort to formalize an expressivity result of deep learning algorithms.

#### **7 Conclusion**

We formalized quantum Hoare logic in Isabelle/HOL, and verified the soundness and completeness of the deduction system for partial correctness. Using this deduction system, we verified the correctness of Grover's search algorithm. This is, to our best knowledge, the first formalization of a Hoare logic for quantum programs in an interactive theorem prover.

This work is intended to be the first step of a larger project to construct a framework under which one can efficiently verify the correctness of complex quantum programs and systems. In this paper, our focus is on formalizing the mathematical machinery to specify the semantics of quantum programs, and prove the correctness of quantum Hoare logic. To verify more complicated programs efficiently, better automation is needed at every stage of the proof. We have already begun with some automation for proving identities in linear algebra. In the future, we plan to add to it automation facility for handling matrix computations, tensor products, positivity of matrices, etc., all linked together by a verification condition generator.

Another direction of future work is to formalize various extensions of quantum Hoare logic, to deal with classical control, recursion, concurrency, etc., with the eventual goal of being able to verify not only sequential programs, but also concurrent programs and communication systems.

**Acknowledgements.** This research is supported through grants by NSFC under grant No. 61625206, 61732001. Bohua Zhan is supported by CAS Pioneer Hundred Talents Program under grant No. Y9RC585036. Yangjia Li is supported by NSFC grant No. 61872342.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## SecCSL: Security Concurrent Separation Logic

Gidon Ernst1(B) and Toby Murray<sup>2</sup>

<sup>1</sup> LMU Munich, Munich, Germany gidon.ernst@lmu.de <sup>2</sup> University of Melbourne, Melbourne, Australia toby.murray@unimelb.edu.au

Abstract. We present SecCSL, a concurrent separation logic for proving expressive, data-dependent information flow security properties of low-level programs. SecCSL is considerably more expressive, while being simpler, than recent compositional information flow logics that cannot reason about pointers, arrays etc. To capture security concerns, SecCSL adopts a relational semantics for its assertions. At the same time it inherits the structure of traditional concurrent separation logics; thus SecCSL reasoning can be automated via symbolic execution. We demonstrate this by implementing SecC, an automatic verifier for a subset of the C programming language, which we apply to a range of benchmarks.

#### 1 Introduction

Software verification successes abound, whether via interactive proof or via automatic program verifiers. While the former has yielded individual, deeply verified software artifacts [21,24,25] primarily by *researchers*, the latter appears to be having a growing impact on *industrial* software engineering [11,36,39].

At the same time, recent work has heralded major advancements in program logics for reasoning about secure *information flow* [23,33,34]—i.e. whether programs properly protect their secrets—yielding the first general program logics and proofs of information flow security for non-trivial concurrent programs [34]. Yet so far, such logics have remained confined to interactive proof assistants, making them practically inaccessible to industrial developers.

This is not especially surprising. The Covern logic [34], for example, pays for its generality with regard to expressive security policies, in terms of complexity. Worse, these logics reason only over very simple toy programming languages, which even lack support for pointers, arrays, and structures. Their complexity, we argue, hinders proof automation and makes scaling up these logics to real-world languages impractical. How, therefore, can we leverage the power of existing automatic deductive verification approaches for security proofs?

In this paper we present *Security Concurrent Separation Logic* (SecCSL), which achieves an unprecedented combination of simplicity, power, and ease of automation by capturing core concepts such as data-dependent variable sensitivity [27,31,50], and shared invariants on sensitive memory [34] in the familiar style of Concurrent Separation Logic (CSL) [38], as exemplified in Sect. 2.

Prior work [14,20] has noted the promise of separation logic for reasoning about information flow yet, to date, that promise remains unrealised. Indeed, the only two prior encodings of information flow concepts into separation logics which we are aware of have overlooked crucial features like concurrency [14], and lack the ability to separately specify the sensitivity of *values* and memory *locations* as we explain in Sect. 2. The logic in [20] lacks soundness arguments altogether while [14] fail to satisfy basic properties needed for automation (see the discussion following Proposition 1).

Designing a logic with the right combination of features, with the right semantics, is therefore non-trivial. To manage this, SecCSL assertions have a *relational* interpretation [6,49] over a standard heap model (Sect. 3). This allows one to canonically encode information flow concepts while maintaining the approach and structure of traditional CSL proofs. To do so we adapt existing proof techniques for the soundness of CSL [46] into a compositional information flow security property (Sect. 4) that, like SecCSL itself, is simple and powerful. We have mechanized the soundness of SecCSL in Isabelle/HOL [37].

To demonstrate SecCSL's ease of use and capacity for automation, we implemented the prototype tool SecC (Sect. 5). We target C because it dominates low-level security-critical code. SecC automates SecCSL reasoning via symbolic execution, in the style of contemporary Separation Logic program verifiers like VeriFast [22], Viper [30], and Infer [10]. SecC correctly analyzes well-known benchmark problems (collected in [17]) within a few milliseconds; and we verify a variant of the CDDC case study [5] from the Covern project. Our Isabelle theories, the open source prototype tool SecC, and examples are available online at https://covern.org/secc [18].

#### 2 An Overview of SecCSL

#### 2.1 Specifying Information Flow Control in SecCSL

Consider the program in Fig. 1. It maintains a global pointer rec to a shared record, protected by the lock mutex. The is\_classified field of the record identifies the confidentiality of the record's data: when is\_classified is true, the value stored in the data field is confidential, and otherwise it is safe to release publicly. The left thread outputs the data in the record whenever it is public by writing to the (memory mapped) output device register pointer OUTPUT\_REG (here also protected by mutex). The right thread updates the record, ensuring its content is not confidential, here by clearing its data.

Suppose assigning a value d to the OUTPUT\_REG register causes d to be outputted to a publicly-visible location. Reasoning, then, that the example is secure requires capturing that (1) the data field of the record pointed to by rec is confidential precisely when the record's is\_classified field says it is, and (2) data

Fig. 1. Example of concurrent information flow.

sink OUTPUT\_REG should never have confidential data written to it. Therefore the example only ever writes non-confidential data into OUTPUT\_REG.

Condition (1) specifies the sensitivity of a data *value* in memory, whereas condition (2) specifies the sensitivity of the data that a memory *location* (i.e. data sink) is permitted to hold. Prior security separation logics [14,20] reason only about value-sensitivity condition (1) but, as we explain below, both are needed. Like those prior logics, in SecCSL one specifies the sensitivity of the value denoted by an expression e via a security *label* -: the assertion e :: means that the sensitivity of the value denoted by expression e is at most -. Security labels are drawn from a lattice with top element high (denoting the most confidential information), bottom element low (denoting public information), and ordered via -: - - means that information labelled with is at least as sensitive as that labelled by -. Using this style of assertion, in conjunction with standard separation logic connectives (explained below), condition (1) can be specified as:

$$\exists c \; d. \; \mathsf{recc} \mapsto (c, d) \land c :: \mathsf{low} \land d :: (c \; ? \; \mathsf{high} : \mathsf{lw}) \tag{1}$$

Separation logic's points-to predicate <sup>e</sup> −→ <sup>e</sup> means the memory location denoted by expression e holds the value denoted by e . Thus (1) can be read as saying that the rec pointer points to a pair of values (c, d). The first c (the value of the is\_classified field) is public. The sensitivity of the second d (the value of the data field) is given by the value of the first c: it is high when c is true and is low otherwise. SecCSL integrates such reasoning about *valuedependent* sensitivity [27,31,50] neatly with functional properties of low-level data structures, which we think is more natural and straightforward than the approach of [34,35] that keeps the two concerns separate.

Value-sensitivity assertion e :: is a judgement on the maximum sensitivity of the data *source(s)* from which e has been derived. Location-sensitivity assertions, on the other hand, are used to specify security policies on data *sinks* like OUTPUT\_REG. These assertions augment the separation logic points-to predicate with a security label -, and are used to specify which parts of the memory are observable to the attacker (and so must never contain sensitive information): e - −→ <sup>e</sup> means that the value denoted by the expression <sup>e</sup> is present in memory at the location denoted by e, and additionally that at all times the sensitivity of the value stored in that locations is never allowed to exceed -. Thus in SecCSL, <sup>e</sup> −→ <sup>e</sup> abbreviates <sup>e</sup> high −−→ <sup>e</sup> . In Fig. 1, that OUTPUT\_REG is publicly-observable can be specified as:

$$
\exists v.\ \mathsf{0UTPUT\\_REG} \xleftarrow{\mathsf{Long}} v\tag{2}
$$

#### 2.2 Reasoning in SecCSL

SecCSL judgements have the form:

$$\ell\_A \vdash \{P\} \; c \; \{Q\} \tag{3}$$

Here -<sup>A</sup> is the *attacker security level*, c is the (concurrent) program command being executed, and P and Q are the program's pre- resp. postcondition. Judgement (3) means that if the program c begins in a state satisfying its precondition P then, when it terminates, the final state will satisfy its postcondition Q. Analogously to [44] the program is guaranteed to be memory safe. We defer a description of -<sup>A</sup> and the implied security property to Sect. 2.3.

As with traditional CSLs, SecCSL is geared towards reasoning over sharedmemory programs that use lock-based synchronisation. Each lock l has an associated invariant inv(l), which is simply a predicate, like P or Q in (3), that describes the shared memory that the lock protects. In Fig. 1, where the lock mutex protects the shared pointer rec and OUTPUT\_REG, the associated invariant inv(mutex) is simply the conjunction of (1) and (2).

$$(\exists c.d.\ \mathbf{rec} \mapsto (c,d) \land c :: \mathbf{low} \land d :: (c ? \ \mathbf{high} : \mathbf{low})) \star (\exists v. \ \mathbf{0utter} \ \mathbf{T\_{REG} \stackrel{\text{lom}}{\longleftrightarrow} v)}\tag{4}$$

Separating conjunction P Q asserts that the assertions P and Q both hold and, additionally, that the memory locations referenced by P and Q respectively do not overlap. Thus SecCSL invariants, like SecCSL assertions, describe together both functional properties (e.g. rec is a valid pointer) and security concerns (e.g. the OUTPUT\_REG location is publicly visible) of the shared state.

When acquiring a lock one gets to assume that the lock's invariant holds [38]. Subsequently, when releasing the lock one must prove that the invariant has been re-established. For example, when reasoning about the code of the left-thread in Fig. 1, upon acquiring the mutex, SecCSL adds formula (4) to the intermediate assertion, which allows proving that the loop body is secure. When reasoning about the right thread, one must prove that the invariant has been re-established when it releases the mutex. This is the reason e.g. that the right thread must clear the data field after setting is\_classified to false.

Reasoning in SecCSL proceeds forward over the program text according to the rules in Fig. 4. When execution forks, as in Fig. 1, one reasons over each thread individually. For Fig. 1, SecCSL requires proving that the guard of the if-condition is low, i.e. that the program is not branching on a secret (rule If in Fig. 4), which would correspond to a timing channel, see Sect. 2.3 below. This follows from the part c :: low of invariant (4). Secondly, after the write to OUTPUT\_REG, SecCSL requires that the expression that is being written to the location OUTPUT\_REG has sensitivity low (rule Store in Fig. 4). This follows from d :: (c ? high : low) in the invariant, which simplifies to d :: high given the guard <sup>c</sup> <sup>≡</sup> true of the if-statement. Finally, when the right thread releases mutex, invariant (4) holds for the updated contents of rec (rule Unlock in Fig. 4).

#### 2.3 Security Intuition and Informal Security Property

But what does security mean in SecCSL? Indeed, the SecCSL a judgement -<sup>A</sup> {P} <sup>c</sup> {Q} additionally implies that the program <sup>c</sup> does not leak any sensitive information during its execution to potential attackers.

The attacker security level -<sup>A</sup> in (3) represents an upper bound on the parts of the program's memory that a potential, passive attacker is assumed to be able to observe before, during, and after the program's execution. Intuitively this encompasses all memory locations whose sensitivity is - -<sup>A</sup>. Which memory locations have sensitivity - -<sup>A</sup> is defined by the *location-sensitivity* assertions in the precondition P and the lock invariants: A memory location *loc* is visible to the -<sup>A</sup> attacker iff <sup>P</sup> or a lock invariant contains some <sup>e</sup> <sup>e</sup>*<sup>l</sup>* −→ <sup>e</sup> and in the program's initial state e evaluates to *loc* and e<sup>l</sup> evaluates to some label such that - - -<sup>A</sup> (see Fig. 3).

Which data is sensitive and should not be leaked to the -<sup>A</sup> attacker is defined by the *value-sensitivity* assertions in P and the lock invariants: an expression e is sensitive when P or a lock invariant contains some e :: e<sup>l</sup> and in the program's initial state e<sup>l</sup> evaluates to some with - - -<sup>A</sup>. Security, then, requires that in all intermediate states of the program's execution no sensitive data (as defined by value-sensitivity assertions) can be inferred via the attacker-observable memory (as defined by location-sensitivity assertions).

SecCSL proves a *compositional* security property that formalises this intuition (see Definition 3). Since the property needs to be compositional with regards to concurrent execution, the resulting security property is *timing sensitive*, meaning that not only must the program never reveal sensitive data into attacker-observable memory locations but the times at which it updates these memory locations cannot depend on sensitive data. It is well-known that timinginsensitive security properties are not compositional under standard scheduling models [34,48]. For this reason SecCSL forbids programs from branching on sensitive values. We believe that this restriction could in principle be relaxed in the future via established techniques [28,29].

SecCSL's top-level soundness (Sect. 4) formalises the above intuitive definition of security in the style of traditional *noninterference* [19] that compares two program executions with respect to the observations that can be made by an attacker. SecCSL adopts a *relational* interpretation for the assertions P and Q, and the lock invariants, in which they are evaluated against pairs of execution states. This relational semantics directly expresses the comparison needed for noninterference. As a result, most of the complexities related to SecCSL's soundness are confined to the semantic level, whereas the calculus retains its similarity to standard separation logic and hence its simplicity.

Under this relational semantics (see Fig. 2 in Sect. 3), when a pair of states satisfies an assertion P, it implies that the two states agree on the values of all non-sensitive expressions as defined by P (Lemma 1). Noninterference is then stated as Theorem 2: Program c with precondition P is secure against the -Aattacker if, whenever executed twice from two initial states jointly satisfying P and the lock invariants (and so agreeing on the values of all data assumed to be initially observable to the -<sup>A</sup> attacker), in all intermediate pairs of states arrived at after running each execution for the same number of steps, the resulting states again agree at that initially -<sup>A</sup>-visible memory. This definition is timing sensitive as it compares executions that have the same number of steps.

#### 3 The Logic SecCSL

#### 3.1 Assertions

Pure expressions e that do not depend on the heap are composed of variables x, function applications, equations, and conditional expressions. Pure relational formulas ρ comprise boolean expressions φ, value sensitivity e :: el, and relational implication ⇒ (wlog. covering relational ¬, ∧, ∨). We assume a standard firstorder many sorted typing discipline (not elaborated).

$$e ::= \begin{array}{c} x \mid f(e\_1, \ldots, e\_n) \mid e\_1 = e\_2 \mid \phi \text{ ? } e\_1 \text{ : } e\_2 \qquad \rho ::= \phi \mid e :: e\_l \mid \rho\_1 \Rightarrow \rho\_2 \end{array}$$

We postulate that the logical signature contains a sort Label, corresponding to the security lattice, with constants low, high: Label and a binary predicate symbol -: Label <sup>×</sup> Label <sup>→</sup> Bool, whose interpretation satisfies the lattice axioms.

SecCSL's assertions P, Q may additionally refer to the heap and thus include the empty heap description, labelled points-to predicates (heap location sensitivity assertions), assertions guarded by (pure) conditionals, ordinary overlapping conjunction as well as separating conjunction, and existential quantification.

$$P ::= \rho \mid \mathsf{emp} \mid e\_p \longmapsto e\_v \mid (\phi \; ? \; P \; : \; Q) \mid P \land Q \mid P \star Q \mid \exists \; x . \; P$$

Disjunction, negation, and implication are excluded because they cause issues for describing the set of --visible heap location to the --attacker, similarly to the problem of defining heap footprints for non-precise assertions [26,40,41]. These connectives can still occur between pure and relational expressions.

The standard expression semantics e<sup>s</sup> evaluates e over a store s, which assigns values to variables x as s(x). The interpretation f<sup>A</sup> of a function symbol f is a function, given statically by a logical structure A. Specifically, -<sup>A</sup> is the semantic ordering of the security lattice. We write <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> if φ<sup>s</sup> = *true*.

The relational semantics of assertions, written (s, h),(s , h ) <sup>|</sup>=- P, is defined in Fig. 2 over two states (s, h) and (s , h ) each consisting of a store and a heap. The semantics is defined against the attacker security level - (called -<sup>A</sup> in Sect. 2.3). Stores s and s are related via e :: el. We require the expression e<sup>l</sup> denoting the sensitivity to coincide on s and s and whenever <sup>e</sup>l<sup>s</sup> -<sup>A</sup> - holds, e must evaluate to the same value both states, (7). Heaps are related by (s, h),(s , h ) <sup>|</sup>= e<sup>p</sup> <sup>e</sup>*<sup>l</sup>* −→ <sup>e</sup>v, which similarly ensures that the two heap fragments are identical h = h when e<sup>l</sup> says so, (9). Conditional assertions φ ? P : Q evaluate to P when φ holds (relationally), and to Q otherwise. The separating conjunction splits both heaps independently, (12). Similarly, the existential quantifier picks two values v and v , (13). Whether parts of the split resp. these two values actually agree will depend on other assertions made.

$$\text{Using the abbreviation } s, h \vdash e\_p \mapsto e\_v \iff h = \{ \|e\_p\|\_s \mapsto \|e\_v\|\_s \}$$

$$(s, h), (s', h') \vdash\_{\ell} \mathsf{emp} \iff h = h' = \otimes \tag{5}$$

$$<\langle s, h \rangle, (s', h') \vdash\_{\ell} \phi \iff s \vdash \phi \text{ and } s' \vdash \phi \tag{6}$$

$$<\langle s, h \rangle, (s', h') \mid =\_\ell e :: e\_\ell$$

$$\iff \|e\_l\|\_s = \|e\_l\|\_{s'} \text{ and } \left(\|e\_l\|\_s \sqsubseteq^{\mathcal{A}} \ell \implies \|e\|\_s = \|e\|\_{s'}\right)$$

$$(s, h), (s', h') \mid =\_\ell \rho\_1 \Rightarrow \rho\_2 \tag{8}$$

$$\iff (s,h), (s',h') \vdash\_{\ell} \rho\_1 \text{ implies } (s,h), (s',h') \vdash\_{\ell} \rho\_2$$

$$\begin{aligned} (s, h), (s', h') &\rightleftharpoons e\_p \xleftarrow{e\_l} e\_v\\ \Longleftrightarrow s, h &\Vdash e\_p \mapsto e\_v \text{ and } s', h' \vdash e\_p \mapsto e\_v \text{ and } (s, h), (s', h') &\Vdash e\_p \dots e\_l \wedge e\_v \dots e\_l \end{aligned} \tag{9}$$

$$\{(s, h), (s', h') \mid =\_{\ell} (\phi \upharpoonright P : Q)\tag{10}$$

$$\{\iota\_{\circ}, \iota\_{\circ}\} \land \iota\_{\circ}^{\prime} \land \iota\_{\circ}^{\prime} \sqcap \sqsubseteq D \quad \text{if } s \vdash \prec \omega \text{ and } \iota\_{\circ}^{\prime} \sqsubseteq \tau$$

$$\iff \begin{cases} (s,h), (s',h') \mid =\_\ell P, & \text{if } s \mid = \phi \text{ and } s' \mid = \phi\\ (s,h), (s',h') \mid =\_\ell Q, & \text{otherwise} \end{cases} $$

$$\begin{aligned} \langle s, h \rangle, (s', h') &\left| =\_{\ell} P \wedge Q \right. \\ \left. \begin{aligned} \left( s, h \right), (s', h') &\left| =\_{\ell} P \text{ and } (s, h), (s', h') \right| =\_{\ell} Q \end{aligned} \right. \end{aligned} \tag{11}$$

$$\begin{aligned} (s,h), (s',h') & \mid \vdash\_{\ell} P \star Q\\ \iff \text{there are disjoint sub-heaps } h\_1, h\_2 \text{ and } h'\_1, h'\_2\\ \text{with } h = h\_1 \uplus h\_2 \text{ and } h' = h'\_1 \uplus h'\_2\\ \text{such that } (s,h\_1), (s',h'\_1) & \mid \vdash\_{\ell} P\_1 \text{ and } (s,h\_2), (s',h'\_2) \mid \vdash\_{\ell} P\_2\\ (s,h), (s',h') & \mid \vdash\_{\ell} \exists \ x.\; P \end{aligned} \tag{13}$$

#### Fig. 2. Relational semantics of assertions.

To capture strong security properties, we require a declarative specification of which heap locations are considered visible to the --attacker, when assertion P

$$\begin{aligned} \text{lows}\_{\ell}(\rho,s) &= \emptyset, & \text{notably } \text{lows}\_{\ell}(e :: e\_{\ell},s) &= \emptyset\\ \text{lows}\_{\ell}(P \star Q,s) &= \text{lows}\_{\ell}(P \wedge Q,s) = \text{lows}\_{\ell}(P,s) \cup \text{lows}\_{\ell}(Q,s) \\ \text{lows}\_{\ell}(e\_{p} \stackrel{e\_{l}}{\longrightarrow} e\_{v},s) &= \begin{cases} \{\llbracket e\_{p} \rceil\_{s}\}, & \text{[\![e\_{l}]\!]\_{s} \stackrel{e}{\longrightarrow} \!\!} \ell \\ \emptyset, & \text{otherwise} \end{cases} \\ \text{lows}\_{\ell}(\phi \stackrel{\ast}{}P \; ; \, Q,s) &= \begin{cases} \text{lows}\_{\ell}(P,s), & s \models \phi \\ \text{lows}\_{\ell}(Q,s), & \text{otherwise} \end{cases} \\ \text{lows}\_{\ell}(\exists \text{ } x \; \, P, s) &= \begin{cases} \text{lows}\_{\ell}(P,s), & \forall \, v \; \text{lows}\_{\ell}(P,s) = \text{lows}\_{\ell}(P,s(x \longleftrightarrow v)) \\ \emptyset, & \text{otherwise} \end{cases} \end{aligned}$$

Fig. 3. Low locations of an assertion.

holds in some (initial) state (see Sect. 2.3). We define this set in Fig. 3, denoted lows-(P, s) for initial store s. Note that, by design, the definition does not give a useful result for an existential like <sup>∃</sup>p v. p low −−→ <sup>v</sup>. This mirrors the usual difficulty of defining footprints for non-precise separation logic assertions [26,40,41]. This restriction is not an issue in practice, as location sensitivity assertions e<sup>p</sup> <sup>e</sup>*<sup>l</sup>* −→ <sup>e</sup><sup>v</sup> are intended to describe the static regions of memory (data sinks) visible to the attacker, for which existential quantification over variables free in e<sup>p</sup> or e<sup>l</sup> is not necessary. A generalization to all precise predicates should be possible.

#### 3.2 Entailments

Although implications between spatial formulas is not part of the assertion language, entailments P - <sup>=</sup><sup>⇒</sup> <sup>Q</sup> between assertions still play a role in SecCSL's Hoare style consequence rule (Conseq in Fig. 4). We discuss entailment now as it sheds useful light on some consequences of SecCSL's relational semantics.

#### Definition 1 (Secure Entailment). P -<sup>=</sup><sup>⇒</sup> <sup>Q</sup> *holds iff*

*–* (s, h),(s , h ) <sup>|</sup>=- P *implies* (s, h),(s , h ) <sup>|</sup>=- Q *for all* s, h *and* s , h *,* and *–* lows-(P, s) <sup>⊆</sup> lows-(Q, s) *for all* s

The security level is used not just in the evaluation of the assertions but also to preserve the --attacker visible locations of P in Q. This reflects the intuition that P is stronger than Q, and so Q should make fewer assumptions than P on the limitations of an attacker's observational powers.

#### Proposition 1.

$$e = e' \land e\_l = e'\_l \land e :: e\_l \stackrel{\ell}{\implies} e' :: e'\_l \tag{14}$$

$$e :: e\_l \land e\_l \sqsubseteq e\_l' \land e\_l' :: \ell \implies e :: e\_l' \tag{15}$$

$$e\_l :: \ell \implies c :: e\_l \qquad \text{ } for \ a \ constant \ c \qquad \text{ } \tag{16}$$

$$e\_1 :: e\_l \land \dots \land e\_n :: e\_l \implies \stackrel{\ell}{f}(e\_1, \dots, e\_n) :: e\_l \qquad \text{for } n > 0 \tag{17}$$

$$e\_p \xleftarrow{e\_l} e\_v \land e\_l \sqsubseteq \ell \implies \stackrel{\ell}{e\_p} \xrightarrow{e\_l} e\_v \land e\_p \upharpoonright e\_l \land e\_v \upharpoonright e\_l \tag{18}$$

$$\left(\forall\ s.\ \text{lows}(P,s) = \text{lows}(Q,s)\right) \mid \text{implies} \ \phi \land \left(\phi \mathrel{?}\ P : Q\right) \stackrel{\ell}{\Longrightarrow} P \tag{19}$$

$$P \stackrel{\ell}{\implies} P' \text{ and } Q \stackrel{\ell}{\implies} Q' \text{ implies } P \star Q \stackrel{\ell}{\implies} P' \star Q' \tag{20}$$

Entailment (14) in Proposition 1 shows that sensitivity of values is compatible with equality. This property fails in the security separation logic of [14], where labels are part of the semantics of expressions but are not compared by equality. The second property (15) captures the intuition that less-sensitive data can always be used in contexts where more-sensitive data might be expected (but not vice-versa). Recall that e<sup>l</sup> here is an expression. The additional condition e<sup>l</sup> :: - guarantees that this expression denotes a meaningful security level, i.e. evaluates identically in both states (cf. (7)). (abusing notation to let the semantic stand for some expression that denotes it). Property (16) encodes that constants do not depend on any state; again the security level expression e<sup>l</sup> must be meaningful, but trivially c :: when is constant, too. Value sensitivity is congruent with function application (17). This is not surprising, as functions map arguments equal in both states to equal results. Yet, as with (14) above, this property fails in [14] where security labels are attached to values. Note that the reverse entailment is false (e.g. for the constant function λx.c).

Via (18), when e<sup>p</sup> <sup>e</sup>*<sup>l</sup>* −→ <sup>e</sup><sup>v</sup> it follows that both the location <sup>e</sup><sup>p</sup> and the value <sup>e</sup><sup>v</sup> adhere to the level el, cf. (9). Note that the antecedent e<sup>p</sup> <sup>e</sup>*<sup>l</sup>* −→ <sup>e</sup><sup>v</sup> is repeated in the consequent to ensure that the set of --attacker visible locations is preserved. Conditional assertions can be resolved when the test is definite, provided that P and <sup>Q</sup> describe the same set of public locations, (19) and symmetrically for <sup>¬</sup>φ. Finally, separating conjunction is monotone wrt. entailment (20).

#### 3.3 Proof System

We consider a canonical concurrent programming language with shared heap locations protected by locks but without shared variables. Commands c comprise assignments to local variables, heap access (load and store),<sup>1</sup> sequential programming constructs, as well as parallel composition and locking. We assume

<sup>1</sup> Volatile memory locations can be treated analogously to locks by introducing an additional assertion characterizing that part of the heap, that is implicitly available to atomic commands. This feature is realized in the Isabelle theories [18] but omitted here in the interests of brevity.

a static collection of valid lock identifiers l, each of which has an assertion as its associated invariant inv(l), characterizing the protected portion of the heap. We describe the program semantics in Sect. 4 as part of the soundness proof.

$$\begin{aligned} c ::= \quad x := e \mid x := [e\_v] \mid [e\_p] := e\_v \mid \mathtt{lock} \, l \mid \mathtt{un} \mathtt{lock} \, l \\\ c\_1; c\_2 \mid c\_1 \mid c\_2 \mid \mathtt{if} \, b \; \mathtt{then} \, c\_1 \; \mathtt{else} \, c\_2 \mid \mathtt{while} \, b \; \mathtt{do} \, c \end{aligned}$$

The SecCSL proof rules are shown in Fig. 4. They extend the standard rules of concurrent separation logic [38] (CSL) by additional side-conditions that amount to information flow checks e :: \_ as part of the respective preconditions.

Similarly to [46], without loss of generality we require that assignments (rules Asg, Load) are always to distinct variables, to avoid renaming in the assertions. In the postcondition of Load, x :: e<sup>l</sup> can be derived by Conseq for (18). Storing to a heap location through an el-sensitive location e<sup>p</sup> <sup>e</sup>*<sup>l</sup>* −→ <sup>e</sup><sup>v</sup> (rule Store) requires that the value e<sup>v</sup> written to that location admits the corresponding security level e<sup>l</sup> of the location ep. Note that due to monotonicity (15) the security level does not have to match exactly. The rules for locking are standard [12]. To preclude information leaks through timing channels, the execution can branch on non-secret values only. This manifests in side conditions b:: for the respective branching condition b where, recall, is the attacker security level (If, While). Logical Split picks those two cases where φ<sup>s</sup> = φ<sup>s</sup>- , ruling out the other two by φ :: -. The consequence rule (Conseq) uses entailment relative to - (Definition 1). Rule Par has the usual proviso that the variables modified in one thread cannot interfere with those relied on by the other and its pre-/postcondition.

#### 4 Security Definition and Soundness

The soundness theorem for SecCSL guarantees that if some triple - {P} <sup>c</sup> {Q} is derived using the rules of Fig. 4, then: all executions of c started in a state satisfying precondition P are memory *safe*, partially *correct* with respect to postcondition Q, and moreover *secure* with respect to the sensitivity of values as denoted by P and Q and at all times respect the sensitivity of locations as denoted by P (see Sect. 2.3). Proof outlines are relegated to Appendix B. All results have been mechanised in Isabelle/HOL [37] and are available at [18].

The top-level security property of SecCSL is a noninterference condition [19]. Noninterference as a security property specifies, roughly, that for any pair of executions that start in states that agree on the values of all attackerobservable inputs, then, from the attacker's point of view the resulting executions will be indistinguishable, i.e. all of the attacker visible observations will agree. In SecCSL, what is "attacker-observable" depends on the attacker level -. The "inputs" are the expressions e, and the attacker-visible inputs are those expressions e whose sensitivity is given by e :: judgements in the precondition P for which - - -. The attacker-visible observations are the contents of all memory locations in lows-(P, s), for initial store s and precondition P. Thus we define when two heaps are indistinguishable to the --attacker.

Fig. 4. Proof rules of SecCSL.

Definition 2 (- Equivalence). *Two heaps coincide on a set of locations* A*, written* <sup>h</sup> <sup>≡</sup><sup>A</sup> <sup>h</sup> *, iff for all* <sup>a</sup> <sup>∈</sup> A. a <sup>∈</sup> dom (h) <sup>∩</sup> dom (h ) *and* h(a) = h (a)*. Two heaps* h *and* h *are* -*-equivalent wrt. store* <sup>s</sup> *and assertion* <sup>P</sup>*, if* <sup>h</sup> <sup>≡</sup><sup>A</sup> <sup>h</sup> *for* A = lows-(P, s)*.*

Then, the --validity of an assertion P in the relational semantics witnesses - equivalence between the corresponding heaps.

Lemma 1. *If* (s, h),(s , h ) <sup>|</sup>=- <sup>P</sup>*, then* <sup>h</sup> <sup>≡</sup><sup>A</sup> <sup>h</sup> *for* <sup>A</sup> = lows-(P, s)*.*

Furthermore, if (s, h),(s , h ) <sup>|</sup>=- P, then lows-(P, s) = lows-(P, s ) since the security levels in labeled points-to predicates must coincide on s and s , cf. (9).

*Semantics.* Semantic configurations, denoted by k in the following, are one of three kinds: (**run** c, L, s, h) denotes a command c in a state s, h where L is a set of locks that are currently not held by any thread and can be acquired by c; (**stop** L, s, h) similarly denotes a final state s, h with residual locks L, and **abort** results from invalid heap access.

The single-step relation (**run** c, L, s, h) <sup>σ</sup> −→ <sup>k</sup> takes running configurations to successors k with respect to a schedule σ that resolves the non-determinism of parallel composition. The schedule <sup>σ</sup> is a list of *actions*: the action <sup>τ</sup> represents the execution of atomic commands and the evaluation of conditionals; the actions 1 and 2 respectively denote the execution of the left- and righthand sides of a parallel composition for a single step, and so define a deterministic scheduling discipline reminiscent of separation kernels [32]. For example, (**run** <sup>c</sup><sup>1</sup> <sup>c</sup>2, L, s, h) 1·<sup>σ</sup> −→ (**run** <sup>c</sup> <sup>1</sup> <sup>c</sup>2, L , s , h ) if (**run** c1, L, s, h) <sup>σ</sup> −→ (**run** c 1, L , s , h ). Configurations (**run** lock l, L, s, h) can only be scheduled if <sup>l</sup> <sup>∈</sup> <sup>L</sup> (symmetrically for unlock)) and otherwise block without a possible step.

Executions k<sup>1</sup> <sup>σ</sup>1···σ*<sup>n</sup>* −−−−−−→<sup>∗</sup> <sup>k</sup>n+1 chain several steps <sup>k</sup><sup>i</sup> <sup>σ</sup>*<sup>i</sup>* −→ <sup>k</sup>i+1 by accumulating the schedule. We are considering partial correctness only, thus the schedule is always finite and so are all executions. The rules for program steps are otherwise standard and can be found in Appendix A.

*Compositional Security.* To prove that SecCSL establishes its top-level noninterference condition, we first define a compositional security condition that provides the central characterization of security for a command c with respect to precondition P and postcondition Q. That central, compositional property we denote secure<sup>n</sup> - (P, c, Q) and formalize below in Definition 3. It ensures that the first n steps (or fewer if the program terminates before that) are safe and preserve --equivalence of the heap locations specified initially in P, but in a way that is compositional across multiple execution steps, across multiple threads of execution and across different parts of the heap. It is somewhat akin, although more precise than, prior characterizations based on *strong low bisimulation* [16,45].

Disregarding the case when c terminates before the n-th step for a moment, for a pair of initial states (s1, h1) and (s 1, h <sup>1</sup>) and initial set of locks L1, and a fixed schedule <sup>σ</sup> <sup>=</sup> <sup>σ</sup><sup>1</sup> ··· <sup>σ</sup>n, securen+1 - (P1, c1, Q) requires that c performs a sequence of lockstep execution steps from each initial state

$$\begin{aligned} (\mathbf{run}\ c\_i, L\_i, s\_i, h\_i) &\xrightarrow{\sigma\_i} (\mathbf{run}\ c\_{i+1}, L\_{i+1}, s\_{i+1}, h\_{i+1}) \qquad \text{for } 1 \le i \le n\\ (\mathbf{run}\ c\_i, L\_i, s'\_i, h'\_i) &\xrightarrow{\sigma\_i} (\mathbf{run}\ c\_{i+1}, L\_{i+1}, s'\_{i+1}, h'\_{i+1}) \end{aligned} \tag{21}$$

These executions must agree on the intermediate commands c<sup>i</sup> and locks L<sup>i</sup> and the ith pair of states must satisfy an intermediate assertion of the following form:

$$<\langle s\_i, h\_i \rangle, (s'\_i, h'\_i) \mid =\_\ell P\_i \star F \star \text{invs}(L\_i) \quad \text{where } \text{invs}(L\_i) = \,\_\text{\star} \star\_{l\_i \in L\_i} \text{inv}(l\_i) \tag{22}$$

Here P<sup>i</sup> describes the part of the heap that command c<sup>i</sup> is currently accessing. invs(Li) is the set of lock invariants for the locks <sup>l</sup><sup>i</sup> <sup>∈</sup> <sup>L</sup><sup>i</sup> not currently acquired. Its presence ensures that whenever a lock is acquired that the associated invariant can be assumed to hold. Finally F is an arbitrary *frame*, an assertion that does not mention variables updated by ci. Its inclusion allows the security property to compose with respect to different parts of the heap.

Moreover, each Pi+1 invs(Li+1) is required to preserve the sensitivity of all --visible heap locations of P<sup>i</sup> invs(Li), i.e. so that lows-(P<sup>i</sup> invs(Li), si) <sup>⊆</sup> lows-(Pi+1 invs(Li+1), si+1). If some intermediate step <sup>m</sup> <sup>≤</sup> <sup>n</sup> terminates, then Pm+1 = Q, ensuring the postcondition holds when the executions terminate. Lastly, neither execution is allowed to reach an **abort** configuration.

If the initial state satisfies P<sup>1</sup> F invs(L1) then (22) holds throughout the entire execution, and establishes the end-to-end property that any final state indeed satisfies the postcondition and that lows-(P<sup>1</sup> invs(L1), s1) <sup>⊆</sup> lows-(P<sup>i</sup> invs(Li), si) with respect to the initially specified low locations.

The property secure<sup>n</sup> - (P, c, Q) is defined recursively to match the steps of the lockstep execution of the program.

#### Definition 3 (Security).

*–* secure<sup>0</sup> -(P1, c1, Q) *holds always.*

	- <sup>k</sup> = (**stop** <sup>L</sup>2, s2, h2) *and* <sup>k</sup> = (**stop** <sup>L</sup>2, s 2, h <sup>2</sup>) *and* P<sup>2</sup> = Q
	- <sup>k</sup> = (**run** <sup>c</sup>2, L2, s2, h2) *and* <sup>k</sup> = (**run** <sup>c</sup>2, L2, s 2, h <sup>2</sup>) *with* secure<sup>n</sup> -(P2, c2, Q)

*such that* (s2, h2),(s 2, h <sup>2</sup>) <sup>|</sup>=- P<sup>2</sup> F invs(L2) *and* lows-(P<sup>1</sup> invs(L1), s1) <sup>⊆</sup> lows-(P<sup>2</sup> invs(L2), s2) *in both cases.*

Two further side condition are imposed, ensuring all mutable shared state lies in the heap (cf. Sect. 3): c<sup>1</sup> doesn't modify variables occurring in invs(L1) and F (which guarantees that both remain intact), and the free variables in P<sup>2</sup> can only mention those already present in P1, c1, or in any lock invariant (which guarantees that P<sup>2</sup> remains stable against concurrent assignments). Note that each step can pick a different frame F, as required for the soundness of rule Par.

Lemma 2. - {P} <sup>c</sup> {Q} *implies* secure<sup>n</sup> -(P, c, Q) *for every* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*.*

*Safety, Correctness and Noninterference.* Execution safety and correctness with respect to pre- and postcondition follow straightforwardly from Lemma 2.

Corollary 1 (Safety). *Given initial states* (s1, h1),(s 1, h <sup>1</sup>) <sup>|</sup>=- P invs(L1) *and two executions of a command* c *under the same schedule to resulting configurations* k *and* k *respectively, then* -{P} <sup>c</sup> {Q} *implies* <sup>k</sup> <sup>=</sup> **abort**∧k <sup>=</sup> **abort***.*

Theorem 1 (Correctness). *For initial states* (s1, h1),(s 1, h <sup>1</sup>) <sup>|</sup>=- P invs(L1)*, given two complete executions of a command* c *under the same schedule* σ

$$\begin{aligned} (\mathbf{run}\ c, L\_1, s\_1, h\_1) &\stackrel{\sigma}{\longrightarrow}^\* (\mathbf{stop}\ L\_2, s\_2, h\_2) \\ (\mathbf{run}\ c\_i, L\_i, s'\_i, h'\_i) &\stackrel{\sigma}{\longrightarrow}^\* (\mathbf{stop}\ L\_2, s'\_2, h'\_2) \end{aligned}$$

*then* - {P} <sup>c</sup> {Q} *implies* (s2, h2),(s 2, h <sup>2</sup>) <sup>|</sup>=-Q invs(L2)*.*

The top-level noninterference property also follows from Lemma 2 via Lemma 1. For brevity, we state the noninterference property directly in the theorem:

Theorem 2 (Noninterference). *Given a command* c*, and initial states* (s1, h1),(s 1, h <sup>1</sup>) <sup>|</sup>=- P invs(L1) *then* - {P} <sup>c</sup> {Q} *implies* <sup>h</sup><sup>i</sup> <sup>≡</sup><sup>A</sup> <sup>h</sup> <sup>i</sup>*, where* A = lows-(P invs(L1), s1)*, for all pairs of heaps* h<sup>i</sup> *and* h <sup>i</sup> *arising from executing the same schedule from each initial state.*

#### 5 SecC: Automating SecCSL

To demonstrate the ease by which SecCSL can be automated, we develop the prototype tool SecC, available at [18]. It implements the logic from Sect. 3 for a subset of C. SecC is currently used to explore reasoning about example programs with interesting security policies. Thus its engineering has focused on features related to security reasoning (e.g. deciding when conditions e :: e<sup>l</sup> are entailed) rather than reasoning about complex data structures.

*Symbolic Execution.* SecC automates SecCSL through symbolic execution, as pioneered for SL in [7]. Similarly to VeriFast's algorithm in [22], the verifier computes the strongest postcondition of a command c when executed in a symbolic state, yielding a set of possible final symbolic states. Each such state σ = (ρ, s, P) maintains a path condition ρ of relational formulas (from procedure contracts, invariants, and the evaluation of conditionals) and a symbolic heap described by a list <sup>P</sup> = (P<sup>1</sup> ··· Pn) of atomic spatial assertions (points-to and instances of defined predicates). The symbolic store s maps program variables to pure expressions, where s(e) denotes substituting s into e. As an example, when <sup>P</sup><sup>i</sup> <sup>=</sup> <sup>s</sup>(ep) → <sup>v</sup> is part of the symbolic heap, a load <sup>x</sup> := <sup>e</sup><sup>p</sup> in <sup>σ</sup> can be executed to yield the updated state (ρ, s(x := v), P) where x is mapped to v.

To find the P<sup>i</sup> we match the left-hand sides of points-to predicates. Similarly, matching is used during checking of entailments <sup>ρ</sup><sup>1</sup> <sup>∧</sup> <sup>P</sup> - <sup>=</sup>⇒ ∃ x. ρ<sup>2</sup> <sup>∧</sup> <sup>Q</sup>, where the conclusion is normalized to prenex form. The entailment is reduced to a non-spatial problem by incrementally computing a substitution τ for the existentials x, removing pairs P<sup>i</sup> = τ (Q<sup>j</sup> ) in the process, as justified by (20) (see also "subtraction rules" in [7, Sec. 4]).

Finally, the remaining relational problem <sup>ρ</sup><sup>1</sup> <sup>⇒</sup> <sup>ρ</sup><sup>2</sup> without spatial connectives can be encoded into first-order [17], by duplicating the pure formulas in terms of fresh variables to represent the second state, and by the syntactic equivalent of (7). The resulting verification condition is discharged with Z3 [15]. This translation is semantically complete. For example, consider Fig. 4 from Prabawa et al. [43]. It has a conditional if(b == b) ..., whose check (b = b)::low, translated to (b = b)=(b = b ) by SecC, holds independently of b's sensitivity.

*Features.* In addition to the logic from Sect. 3, SecC supports procedure modular verification with pre-/postconditions as usual; and it supports user-defined spatial predicates. While some issues of the C source language are not addressed (yet), such as integer overflow, those that impact directly on information flow security are taken into account. Specifically, the shortcut semantics of boolean operators &&, ||, and ternary \_ ? \_ : \_ count as branching points and as such the left hand side resp. the test must not depend on sensitive data, similarly to the conditions of if statements and while loops.

A direct benefit of the integration of security levels into the assertion language is that it becomes possible to specify the sensitivity of data passed to library and operating system functions. For example, the execution time of malloc(len) would depend on the value of len, which can thus be required to satisfy len :: low by annotating its function header with an appropriate precondition, using SecC's requires annotation. Likewise, SecC can reason about limited forms of declassification, in which external functions are trusted to safely release otherwise sensitive data, by giving them appropriate pre-/postconditions. For example, a password hashing library function prototype might be annotated with a postcondition asserting its result is low, via SecC's ensures annotation.

*Examples and Case Study.* SecC proves Fig. 1 secure, and correctly flags buggy variants as insecure, e.g., where the test in thread 1 is reversed, or when thread 2 does not clear the data field upon setting the is\_classified to FALSE. SecC also correctly analyzes those 7 examples from [17] that are supported by the logic and tool (each in ∼10 ms). All examples are available at [18].

To compare SecC and SecCSL against the recent Covern logic [34], we took a non-trivial example program that Murray et al. verified in Covern, manually translated it to C, and verified it automatically using SecC. The original program<sup>2</sup>, written in Covern's tiny While language embedded in Isabelle/HOL, models the software functionality of a simplified implementation of the Cross Domain Desktop Compositor (CDDC) [5]. The CDDC is a device that facilitates interactions with multiple PCs, each of which runs applications at differing sensitivity, from a single keyboard, mouse and display. Its multi-threaded software handles routing of keyboard input to the appropriate PC and switching between the PCs via mouse gestures. Verifying the C translation required adding SecCSL annotations for procedure pre-/postconditions and loop invariants. The C translation including those annotations is ∼250 lines in length. The present, unoptimised, implementation of SecC verifies the resulting artifact in <sup>∼</sup>5 s. In contrast, the Covern proof of this example requires <sup>∼</sup>600 lines of Isabelle/HOL definitions/specification, plus ∼550 lines of Isabelle proof script.

#### 6 Related Work

There has been much work targeting type systems and program logics for concurrent information flow. Karbyshev et al. [23] provide an excellent overview. Here we concentrate on work whose ideas are most closely related to SecCSL.

Costanzo and Shao [14] propose a sequential separation logic for reasoning about information flow. Unlike SecCSL, theirs does not distinguish value and location sensitivity. Their separation logic assertions have a fairly standard (nonrelational) semantics, at the price of having a *security-aware* language semantics

<sup>2</sup> https://bitbucket.org/covern/covern/src/master/examples/cddc/Example\_CDDC\_ WhileLockLanguage.thy.

that propagates security labels attached to values in the store and heap. As mentioned in Sect. 3.2, this has the unfortunate side-effect of breaking intuitive properties about sensitivity assertions. We conjecture that the absence of such properties would make their logic harder to automate than SecCSL, which SecC demonstrates is feasible. SecCSL avoids the aforementioned drawbacks by adopting a relational assertion semantics.

Gruetter and Murray [20] propose a security separation logic in Coq [8] for Verifiable C, the C subset of the Verified Software Toolchain [2,3]. However they provide no soundness proof for its rules and its feasibility to automate is unclear.

Two recent compositional logics for concurrent information flow are the Covern logic [34] and the type and effect system of Karbyshev et al. [23]. Both borrow ideas from separation logic. However, unlike SecCSL, neither is defined for languages with pointers, arrays etc.

Like SecCSL, Covern proves a timing-sensitive security property. Location sensitivity is defined statically by value-dependent predicates, and value sensitivity is tracked by a dependent security typing context Γ [35], relative to a Hoare logic predicate P over the entire shared memory. In Covern locks carry non-relational invariants. In contrast, SecCSL unifies these elements together into separation logic assertions with a relational semantics. Doing so leads to a much simpler logic, amenable to automation, while supporting pointers, etc.

On the other hand, Karbyshev et al. [23] prove a timing-*insensitive* security property, but rely on primitives to interact with the scheduler to prevent leaks via scheduling decisions. Unlike SecCSL, which assumes a deterministic scheduling discipline, Karbyshev et al. support a wider class of scheduling policies. Their system tracks resource ownership and transfer between threads at synchronisation points, similar to CSLs. Their resources include *labelled scheduler resources* that account for scheduler interaction, including when scheduling decisions become tainted by secret data—something that cannot occur in SecCSL's deterministic scheduling model.

Prior logics for sequential languages, e.g. [1,4], have also adopted separation logic ideas to reason locally about memory, combining them with relational assertions similar to SecCSL's e :: e<sup>l</sup> assertions. For instance, the agreement assertions A(e) of [4] coincide with SecCSL's e :: low. Unlike SecCSL, some of these logics support languages with explicit declassification actions [4].

Self-composition is another technique to exploit existing verification infrastructure for proofs of general hyperproperties [13], including but not limited to non-interference. Eilers et al. [17] present such an approach for Viper, which supports an assertion language similar to that of separation logic. It does not support public heap locations (which are information sources and sinks at the same time) albeit sinks can be modeled via preconditions of procedures. A similar approach is implemented in Frama-C [9]. Both of [9,17] do not support concurrency, and it remains unclear how self-composition could avoid an exponential blow-up from concurrent interleaving, which SecCSL avoids.

The soundness proof for SecCSL follows the general structure of Vafeiadis' [46] for CSL, which is also mechanised in Isabelle/HOL. There is, however, a technical difference: His analog of Definition 3, a recursive predicate called safen(c, s, h, Q), refers to a semantic initial state s, h whereas we propagate a syntactic assertion (22) only. Our formulation has the benefit that some of the technical reasoning in the soundness proof is easier to automate. Its drawback is the need to impose technical side-conditions on the free variables of the frame F and the intermediate assertions Pi.

#### 7 Conclusion

We presented SecCSL, a concurrent separation logic for proving expressive datadependent information flow properties of programs. SecCSL is considerably simpler, yet handles features like pointers, arrays etc., which are out of scope for contemporary logics. It inherits the structure of traditional concurrent separation logics, and so like those logics can be automated via symbolic execution [10, 22,30]. To demonstrate this, we implemented SecC, an automatic verifier for expressive information flow security for a subset of the C language.

Separation logic has proved to be a remarkably powerful vehicle for reasoning about programs, weak memory concurrency [47], program synthesis [42], and many other domains. With SecCSL, we hope that in future the same possibilities might be opened to verified information flow security.

Acknowledgement. We thank the anonymous reviewers for their careful and detailed comments that helped significantly to clarify the discussion of finer points.

This research was sponsored by the Department of the Navy, Office of Naval Research, under award #N62909-18-1-2049. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research.

#### A Command Semantics

Symmetric parallel rules in which <sup>c</sup><sup>2</sup> is scheduled under the action 2 omitted.

$$\begin{array}{c} \begin{array}{l} s' = s(x \mapsto [e]\_s) \\ \hline \mathbf{(run} \ x := e, L, s, h) \xrightarrow{\langle \gamma \rangle} \begin{pmatrix} \mathsf{top} \ L, s', h \end{pmatrix} \end{array} & \begin{array}{l} [e]\_s \notin \text{dom} \ (h) \\ \hline \end{array} \\ \begin{array}{l} [e]\_s \in \text{dom} \ (h) \end{array} & s' = s(x \mapsto h [[e]\_s)) \\ \hline \end{array} & \begin{array}{l} [e]\_s \notin \text{dom} \ (h) \\ \hline \end{array} \\ \begin{array}{l} [e]\_s \in \text{dom} \ (h) \\ \hline \end{array} & \begin{array}{l} [e]\_s \notin \text{dom} \ (h) \\ \hline \end{array} \\ \begin{array}{l} [e]\_s \in \text{dom} \ (h) \\ \hline \end{array} & h' = h [[e]\_s \mapsto [e]\_2 \mathbbm{1}\_s] \\ \hline \end{array} \\ \begin{array}{l} [e]\_s \in \text{dom} (h) \\ \hline \end{array} & l' = e\_2, L, s, h \xleft} \begin{array}{l} [e]\_s \in \text{dom} (h) \\ \hline \end{array} \\ \begin{array}{l} [e]\_s \in \text{dom} (h) \\ \hline \end{array} & l' = e\_2, L, s, h \xleft} \begin{array}{l} [e]\_s \in \text{dom} L, s, h' \end{array} \end{array} \end{array}$$

l /<sup>∈</sup> L L <sup>=</sup> <sup>L</sup> ∪ {l} (**run** unlock l, L, s, h) τ −→ (**stop** <sup>L</sup> , s, h) (**run** c1, L, s, h) <sup>σ</sup> −→ **abort** (**run** c1; c2, L, s, h) <sup>σ</sup> −→ **abort** (**run** c1, L, s, h) <sup>σ</sup> −→ **abort** (**run** <sup>c</sup><sup>1</sup> <sup>c</sup>2, L, s, h) 1·<sup>σ</sup> −−−→ **abort** (**run** c1, L, s, h) <sup>σ</sup> −→ (**stop** <sup>L</sup> , s , h ) (**run** c1; c2, L, s, h) <sup>σ</sup> −→ (**run** <sup>c</sup>2, L , s , h ) (**run** c1, L, s, h) <sup>σ</sup> −→ (**run** <sup>c</sup> 1, L , s , h ) (**run** c1; c2, L, s, h) <sup>σ</sup> −→ (**run** <sup>c</sup> <sup>1</sup>; c2, L , s , h ) (**run** c1, L, s, h) <sup>σ</sup> −→ (**stop** <sup>L</sup> , s , h ) (**run** <sup>c</sup><sup>1</sup> <sup>c</sup>2, L, s, h) 1·<sup>σ</sup> −−−→ (**run** <sup>c</sup>2, L , s , h ) (**run** c1, L, s, h) <sup>σ</sup> −→ (**run** <sup>c</sup> 1, L , s , h ) (**run** <sup>c</sup><sup>1</sup> <sup>c</sup>2, L, s, h) 1·<sup>σ</sup> −−−→ (**run** <sup>c</sup> <sup>1</sup> <sup>c</sup>2, L , s , h ) if <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>b</sup> then <sup>c</sup> <sup>=</sup> <sup>c</sup><sup>1</sup> else <sup>c</sup> <sup>=</sup> <sup>c</sup><sup>2</sup> (**run** if <sup>b</sup> then <sup>c</sup><sup>1</sup> else <sup>c</sup>2, L, s, h) τ −−→ (**run** <sup>c</sup> , L, s, h) <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>b</sup> (**run** while <sup>b</sup> do c, L, s, h) τ −−→ (**stop** L, s, h) <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>b</sup> (**run** while b do c <sup>ω</sup> , L, s, h) τ −−→ (**run** (c; <sup>ω</sup>), L, s, h)

$$k \stackrel{\langle \rangle}{\longrightarrow} k \stackrel{\begin{array}{c} \frac{\sigma\_1}{2} \stackrel{\sigma\_1}{} \quad \quad k' \stackrel{\begin{array}{c} \frac{\sigma\_2}{} \stackrel{\sigma\_2}{} \stackrel{\sigma\_2}{}} \stackrel{\mathbf{k''}}{} \end{array} \end{array}$$

#### B Proofs

#### Proof of Lemma 1

If (s, h),(s , h ) <sup>|</sup>=- P, then h <sup>A</sup> <sup>≡</sup> <sup>h</sup> for <sup>A</sup> = lows-(P, s).

*Proof.* By induction on the structure of P, noting that lows-(\_, s) contains locations of the corresponding sub-heap only.

#### Proof of Lemma 2


*Proof (Outline).* By induction on the derivation of the validity of the judgement. Noting that n = 0 is trivial, we may unfold the recursion of the security definition once to prove the base cases of assignment, load, store, and locking, which then follow from the respective side conditions of the proof rules.

For rules If and While, the side condition b :: guarantees that the test evaluates equivalently in the two states and thus execution proceeds with the same remainder program.

Except for If, all remaining rules need a second induction on n to stepwise match security of the premise to security of the conclusion (e.g. over the steps of the first command in a sequential composition c1; c2).

The rule Frame instantiates the frame F with the same assertion in each step, whereas Par uses the frame F to preserve the current precondition P<sup>2</sup> of c<sup>2</sup> over steps of <sup>c</sup><sup>1</sup> and vice-versa.

#### Proof of Corollary 1

Given a command c and initial states (s1, h1),(s 1, h <sup>1</sup>) <sup>|</sup>=- P invs(L1) and two executions under the same schedule to resulting configurations k and k respectively, then -{P} <sup>c</sup> {Q} implies <sup>k</sup> <sup>=</sup> **abort** <sup>∧</sup> <sup>k</sup> <sup>=</sup> **abort**.

*Proof.* By induction on the number of steps n of the executions from secure<sup>n</sup> -(P, c, Q) via Lemma 2.

#### Proof of Theorem 1

Given a command c and initial states (s1, h1),(s 1, h <sup>1</sup>) <sup>|</sup>=- P invs(L1) and two complete executions under the same schedule σ

$$\begin{aligned} (\mathbf{run}\ c, L\_1, s\_1, h\_1) &\stackrel{\sigma}{\longrightarrow}^\* (\mathbf{stop}\ L\_2, s\_2, h\_2) \\ (\mathbf{run}\ c\_i, L\_i, s'\_i, h'\_i) &\stackrel{\sigma}{\longrightarrow}^\* (\mathbf{stop}\ L\_2, s'\_2, h'\_2) \end{aligned}$$

then - {P} <sup>c</sup> {Q} implies (s2, h2),(s 2, h <sup>2</sup>) <sup>|</sup>=-Q invs(L2).

*Proof.* By induction on the number of steps n of the executions from secure<sup>n</sup> -(P, c, Q) via Lemma 2.

#### Proof of Theorem 2

Given a command c, and initial states (s1, h1),(s 1, h <sup>1</sup>) <sup>|</sup>=- P invs(L1) then - {P} <sup>c</sup> {Q} implies <sup>h</sup><sup>i</sup> A <sup>≡</sup> <sup>h</sup> <sup>i</sup>, where <sup>A</sup> = lows-(P, s1), for all pairs of heaps h<sup>i</sup> and h <sup>i</sup> arising from executing the same schedule from each initial state.

*Proof.* By induction on the number of steps i up to that state from secure<sup>i</sup> -(P, c, Q) via Lemma 2 we have lows-(P invs(L1), s1) <sup>⊆</sup> lows-(P<sup>i</sup> invs(L1), si) transitively over the prefix, where P<sup>i</sup> and s<sup>i</sup> are from the i-th state. The theorem then follows from Lemma 1 in Sect. 3.1.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Reachability Analysis for AWS-Based Networks**

John Backes<sup>1</sup>, Sam Bayless1,4, Byron Cook1,2, Catherine Dodge<sup>1</sup>, Andrew Gacek1(B) , Alan J. Hu<sup>4</sup>, Temesghen Kahsai<sup>1</sup>, Bill Kocik<sup>1</sup>, Evgenii Kotelnikov1,3, Jure Kukovec1,5, Sean McLaughlin<sup>1</sup>, Jason Reed<sup>6</sup>, Neha Rungta<sup>1</sup>, John Sizemore<sup>1</sup>, Mark Stalzer<sup>1</sup>, Preethi Srinivasan<sup>1</sup>, Pavle Suboti´c1,2, Carsten Varming<sup>1</sup>, and Blake Whaley<sup>1</sup>

> Amazon, Seattle, USA gacek@amazon.com University College London, London, UK Chalmers University of Technology, Gothenburg, Sweden University British Columbia, Vancouver, Canada TU Wien, Vienna, Austria Semmle Inc, San Francisco, USA

**Abstract.** Cloud services provide the ability to provision virtual networked infrastructure on demand over the Internet. The rapid growth of these virtually provisioned cloud networks has increased the demand for automated reasoning tools capable of identifying misconfigurations or security vulnerabilities. This type of automation gives customers the assurance they need to deploy sensitive workloads. It can also reduce the cost and time-to-market for regulated customers looking to establish compliance certification for cloud-based applications. In this industrial case-study, we describe a new network reachability reasoning tool, called Tiros, that uses off-the-shelf automated theorem proving tools to fill this need. Tiros is the foundation of a recently introduced network security analysis feature in the *Amazon Inspector* service now available to millions of customers building applications in the cloud. Tiros is also used within Amazon Web Services (AWS) to automate the checking of compliance certification and adherence to security invariants for many AWS services that build on existing AWS networking features.

#### **1 Introduction**

Cloud computing provides on-demand access to IT resources such as compute, storage, and analytics via the Internet with pay-as-you-go pricing. Each of these IT resources are typically networked together by customers, using a growing number of virtual networking features. Amazon Web Services (AWS), for example, today provides over 30 virtualized networking primitives that allow customers to implement a wide variety of cloud-based applications.

Correctly configured networks are a key part of an organization's security posture. Clearly documented and, more importantly, verifiable network design is important for compliance audits, *e.g.* the Payment Card Industry Data Security Standard (PCI DSS) [10]. As the scale and diversity of cloud-based services grows, each new offering used by an organization adds another dimension of possible interaction at the networking level. Thus, customers and auditors increasingly need tooling for the security of their networks that is accurate, automated and scalable, allowing them to automatically detect violations of their requirements.

In this industrial case-study, we describe a new tool, called Tiros, which uses off-the-shelf automated theorem proving tools to perform formal analysis of virtual networks constructed using AWS APIs. Tiros encodes the semantics of AWS networking concepts into logic and then uses a variety of reasoning engines to verify security-related properties. Tools that Tiros can use include Souffle´ [17], MonoSAT [3], and Vampire [23]. Tiros performs its analysis statically: it sends no packets on the customer's network. This distinction is important. The size of many customer networks makes it intractable to find problems through traditional network probing or penetration testing. Tiros allows users to gain assurance about the security of their networks that would be impossible through testing.

Tiros is used directly today by AWS customers as part of the Amazon Inspector service [11], which currently checks six Tiros-based network reachability invariants on customer networks. The use of Tiros is especially popular amongst security-obsessed customers, *e.g.,* the world's largest hedge fund Bridgewater Associates, an AWS customer, recently discussed the importance of network verification techniques for their organization [6], including their usage of Tiros.

**Related Work.** Several previous tools using automated theorem proving have been developed in an effort to answer questions about software defined networks (SDNs) [1,2,5,12,13,16,19,25]. Similar to our approach, these tools reduce the problems to automated reasoning engines. In some cases, they employ overapproximative static analysis [18,19]. In other cases, they use general purpose reasoning engines such as Datalog [12,15], BDD [1], SMT [5,16], and SAT Solvers [2,25]. VeriCon [2], NICE [8], and VeriFlow [19] verify network invariants by analyzing software-defined-network (SDN) programs, with the former two applying formal software verification techniques, and the latter using static analysis to split routes into equivalence classes. SecGuru [5,16] uses an SMT solver to compare the routes admitted by access control lists (ACLs), routing tables, and border gateway protocol (BGP) policies, but does not support fullnetwork reachability queries. In our approach we employ multiple encodings and reasoning engines. Our SMT encoding is similar in design to Anteater [25] and ConfigChecker [1]. Anteater performs SAT-based bounded model checking [4], while ConfigChecker uses BDD-based fixed-point model checking [7]. Previous work has applied Datalog to reachability analysis in either software or network contexts [12–14,24]. The approach used in Batfish [13,24] and SyNET [12] is similar to our Datalog approach; they allow users to express general queries about whole-network reachability properties using an expressive logic language. Batfish presents results for small but complex routing scenarios, involving a few dozen routers. SyNET [12] also uses a similar Datalog representation of network reachability semantics, but rather than verifying network reachability properties, they provide techniques to synthesize networks from a specification. The focus in Tiros's encoding is expressiveness and completeness; it encodes the semantics of the entire AWS cloud network service stack. It scales well to networks consisting of hundreds of thousands of instances, routers, and firewall rules.

#### **2 AWS Networking**

AWS provides customers with virtualized implementations of practically all known traditional networking concepts, *e.g.* subnets, route tables, and NAT gateways. In order to facilitate on-demand scalability, many AWS network features focus on elasticity, *e.g.* Elastic Load Balancers (ELBs) support autoscaling groups, which customers configure to describe when/how to scale resource usage. Another important AWS networking concept is that of Virtual Private Cloud (VPC), in which customers can use AWS resources in an isolated virtual network that they control. Over 30 additional networking concepts are supported by AWS, including Elastic Network Interfaces (ENIs), internet gateways, transit gateways, direct connections, and peering connections.

Figure 1 is an example AWS-based network that consists of two subnets "Web" and "Database". The "Web" subnet contains two instances (sometimes called virtual machines) and the "Database" subnet contains one instance. Note that these machines are in fact virtualized in the AWS data center. The "Web" subnet's route table has a route to the internet gateway, whereas the "Database" subnet's route table only has local routes (within the VPC). In addition, each of the subnets has an ACL that contains security access rules. In particular, one of the rules forbids SSH access to the database servers.

**Fig. 1.** An example VPC network

AWS-based networks frequently start small and grow over time, accumulating new instances and security and access rules. Customers or regulators want to make sure that their VPC networks retain security invariants as their complexity grows. A customer may ask *network configuration questions* such as:


To answer such questions we must reason about which network components are accessible via feasible paths through the VPC, either from the internet, from other components in the VPC, or from other components in a different VPC via a peering connection or transit gateway.

## **3 AWS Networking Semantics as Logic**

Tiros statically builds a model of an AWS network architecture to check reachability properties. The model of the network consists of two parts, the *formal specification* and the *snapshot* of the network. The specification formalizes the semantics of the AWS networking components, *e.g.*, how a route table directs traffic from a subnet, in which order a firewall applies rules in a security group, and how load balancers route traffic. The snapshot describes the topology and details of the network. For example, the snapshot contains the list of instances, subnets, and their route tables in a particular VPC (or set of VPCs). To answer reachability questions, Tiros combines the formal specification, the snapshot, and a query into a formula that represents the answer. Tiros uses up to three reasoning engines to answer queries: the Datalog solver Souffle´ [17], the SMT solver MonoSAT [3], or the first-order theorem prover Vampire [23]. Due to the differing limitations and capabilities of each of these tools, we maintain three independent encodings of network semantics into logic, one for each of solver.

*Datalog Encoding.* In the Datalog encoding, a network model is a set of Datalog clauses (stratified, possibly recursive or negated Horn clauses without function symbols) using the theory of bit vectors to describe ports, IPv4 addresses, and subnet masks. The *specification* part of the network model contains types, predicates, constants, and rules that describe the semantics of the networking components in Amazon VPCs. The specification of Amazon VPC networks maps to approximately 50 types, 200 predicates, and over 240 rules. For example, a specification of the semantics of SSH tunneling is defined recursively: An instance can SSH tunnel to another instance iff it can either SSH to it directly, or through a chain of intermediate instances. We express this with predicates *canSshTunnel* and *canSsh*, of the type Instance × Instance, and rules:

$$\begin{aligned} canSshTunnel(I\_1, I\_2) &\leftarrow canSsh(I\_1, I\_2).\\ canSshTunnel(I\_1, I\_2) &\leftarrow canSshTunnel(I\_1, I\_3) \land canSshTunnel(I\_3, I\_2). \end{aligned}$$

The *snapshot* part of the network model contains constants and *facts* (ground clauses with no antecedents) that describe the configuration of a specific AWS network. Constants have the form typeid. For example, the snapshot of a network with an instance with id 1234 in a subnet with id web consists of the constants instance<sup>1234</sup> and subnetweb, and the fact *hasSubnet*(instance1234,subnetweb).

We illustrate the Datalog encoding using examples from Sect. 2. The network configuration question, q(I), is encoded as q(I) ← *hasSubnet*(I,subnetweb) ∧ *hasTag*(I,tagbastion). The network reachability question, r(I,E), is encoded as:

$$\begin{aligned} r(I,E) &\leftarrow \textit{hasEni}(I,E) \land \textit{isPublic}(I \textit{Address}) \land \\ &\leftarrow \textit{eachPubic} \, T \, \textit{cp} \, U \textit{dp} (\textit{dir}\_{\textit{ingsness}}, \textit{proto}\_6, E, \textit{port}\_{22}, \textit{Adress}, \textit{port}\_{40000}). \end{aligned}$$

In our Datalog encoding, we use the theory of bitvectors to reason about ports, IP addresses, and CIDRs. We use Souffle´ as our Datalog solver, but in principle other Datalog solvers could also be used, so long as they also support bitvectors. We direct the reader to our co-author's dissertation (cf. Chapter 7 [28]) for a more detailed explanation of the Datalog encoding.

**Fig. 2.** (Left) The symbolic graph corresponding to the VPC in Fig. 1. (Right) A simplified symbolic packet, composed of bitvectors.

*SMT Encoding.* Our SMT encoding models network reachability as a *symbolic graph* of network components, along with one or more symbolic packet headers consisting of bitvectors for the source and destination addresses and ports. A symbolic graph consists of a set of nodes and directed edges, where the edges may be traversable or untraversable. Predicate edge(u, v), where u and v are nodes, is true iff the corresponding edge is traversable. The assignment of the edge(u, v) atoms in the formula determines which paths exist in the graph.

Figure 2 shows a symbolic graph corresponding to the VPC from Fig. 1. In our encoding, nodes represent networking components (such as instances, network interfaces, subnets, route tables, or gateways), and edges represent possible paths that packets may take between those components (such as between an instance and its network interface). Constraints between edge atoms and bitvectors in the packet headers define the routes that a packet can take.

For example, our encoding introduces an edge between each network interface node, Eni-a, and its containing Subnet-web node, edge(Eni-a, Subnet-web). As shown in Fig. 3, we also introduce constraints that force edge(Eni-a, Subnet-web) to be false if the packet's source address does not match the ENI's IP address. This ensures that packets leaving the ENI must have that ENI's IP address as their source address. Similar constraints ensure that packets entering the ENI must have that ENI's IP address as their destination address.

We encode reachability constraints into this graph using the SMT solver MonoSAT [3], which supports a theory of finite graph reachability. Specifically, we add a start and end node to the graph, with edges to the source components of the query and from the destination components of the query, and then we enforce a graph reachability constraint reaches(start, end), which is true iff there is a start-end path under assignment to the edge literals. To encode the query "Are there any instances that can be accessed from the public internet over SSH?", we would add an edge from the start node to the internet, and from each EC2 instance to the end node. Additionally, we would add bitvector constraints forcing the protocol of the symbolic packet to be exactly 6 (TCP), and the destination port to be exactly 22.

**Fig. 3.** A small portion of the VPC graph, with constraints over the edges between an ENI and its subnet enforcing that packets entering or leaving the ENI have that ENI's source or destination address.

The SMT encoding described above is intended specifically for answering network reachability queries, and does not currently take into account other properties (such as tags) that would be required to model the more general network configuration queries supported by our datalog encoding.

*First-Order Encoding.* In our encoding for superposition solvers such as Vampire [23], we translate each network configuration question into a many-sorted first order logic problem that is unsatisfiable iff the answer to the question is true, and each network reachability question into a FOL problem that only has finite models, each corresponding to an answer to the question. For this encoding, we assume that network configuration questions have strictly yes/no answers, while network reachability questions return lists of solutions. In addition to its default saturation mode, Vampire implements a MACE-style [26] finite model builder for many-sorted first-order logic [27]. Thus we use Vampire both as a saturation-based theorem prover and a finite model builder, running both modes in parallel and recording the result of the fastest successful run.

Our encoding begins with the same set of facts as were generated from the network model by our Datalog encoding, represented here by the symbols (A1, A2,...). From there, we handle network configuration and network reachability questions differently, with network-configuration encodings optimized for proof-by-contradiction, while reachability configurations are optimized for model-building. Proof-by-contradiction for yes/no questions is potentially faster than model-building, as intermediate variables need not be enumerated.

We encode a network configuration question ϕ in negated form: A<sup>1</sup> ∧ ... ∧ <sup>A</sup>*<sup>n</sup>* ⇒ ¬ϕ. If Vampire can prove a contradiction in the negated formula, then <sup>ϕ</sup> holds. We encode a network reachability question ϕ into a formula of the form A<sup>1</sup> ∧ ... ∧ A*<sup>n</sup>* ∧ (∀z¯)(q(¯z) ⇔ ϕ) ⇒ (∀z¯)q(¯z), where q is a fresh predicate symbol, and ¯z are free variables of the network question ϕ. Each substitution of ¯z that satisfies q corresponds to a distinct solution to the reachability question.

Our encoding targets Vampire's implementation of many-sorted first-order logic with equality, extended with the theory of linear integer arithmetic, the theory of arrays [22], and the theory of tuples [20]. We encode types, constants, and predicates using Clark completion [9]. We direct the reader to our co-author's dissertation (cf. Chapter 5 [21]) for a more detailed explanation of the Vampire encoding, including a detailed analysis of the performance trade-offs considered in this encoding.

#### **4 Usage and Performance**

In this section we describe the performance of the various solvers when used by Tiros in practice. Recall that our MonoSAT implementation can only answer reachability questions, whereas the other implementations also answer more general network configuration questions (such as the examples in Sect. 2).

In our experiments with Vampire, we found that the first order logic encoding we used does not scale well. As we were not able to obtain good performance from our Vampire-based implementation, in what follows we only present the experimental results for MonoSAT and Souffle´. We explain the poor performance of the Vampire encoding mainly by the fact that large finite domains, routinely used in network specifications, are represented as long clauses coming from the domain closure axioms. Saturation theorem provers, including Vampire, have a hard time dealing with such clauses.

*Amazon Inspector.* To compare the performance of Souffle´ and MonoSAT in the context of the Tiros-based Amazon Inspector feature we randomly selected 10,000 network snapshots evaluated in December 2018. On these queries Souffle´ required 4.1 s in the best-case, 45.1 s in the worst case, with 50thpercentile runtime of 5.1 s and 90th-percentile runtime of 5.5 s. MonoSAT required 0.8 s in the best case, 2.6 s in the worst case, with a 50th-percentile runtime of 1.39 s and 90th-percentile runtime of 1.79 s. To give the reader an idea of the relative size of the constraint systems solved, in the smallest case our Souffle´ encoding consisted of 2,856 facts, and the MonoSAT encoding consisted of 609 variables, 21 bitvectors, and 2,032 clauses. In the largest case, our Souffle´ encoding consisted of 7517 facts, and the MonoSAT encoding consisted of 2,038 variables, 21 bitvectors, and 17,731 clauses.

*Scalability Tests.* MonoSAT and Souffle´ scale to all queries evaluated using Amazon Inspector. To help understand the limits of the Souffle´ and MonoSAT-based backends on larger networks, in Fig. <sup>4</sup> we compare the performance of the solvers on a series of artificially generated networks of increasing size, with 100, 1000, 10,000, and 100,000 instances. In each case, the query is *"list all open paths from the Internet to any instance in the VPC"*. We can see from the figure that neither approach dominates. In most cases the Datalog encoding is able to scale to 10,000 instances, but in no cases can it scale to 100,000 instances. In most cases the SMT encoding is able to scale to networks with 100,000 instances, but for the 'benchmark-2' networks, MonoSAT requires almost a full hour to solve the 10,000 instance network that Souffle´ solves in 81 s. The SMT encoding performs poorly on 'benchmark-2' because that benchmark has a vast number of distinct feasible paths through the network, each requiring a separate SMT solver call. Other benchmarks have fewer distinct paths.

**Fig. 4.** Comparison of runtime in seconds for the different solver backends. Each benchmark uses a different color, *e.g.* Souffle´ on benchmark-1 is a solid blue line, and MonoSAT on benchmark-1 is a dashed blue line. In these experiments, Souffle´ recompiles each query before solving it, which adds ≈ 45 s to the runtime of each Souffle´ query. In practice this cost can be amortized by caching compiled queries. (Color figure online)

*Automating PCI Compliance Auditing.* Many AWS services are built using other AWS services, *e.g.* AWS Lambda is built using AWS EC2 and the various AWS networking features. Thus within AWS we are using Tiros to prove the correctness of our own internal requirements. As an example, we use Tiros to partially automate evidence generation for compliance audits of Payment Card Industry Data Security Standard (PCI DSS) [10]. Tiros is used across the many customer-facing AWS services that are built using AWS networking to establish controls supporting PCI DSS requirements 1.2, 1.3.1, 1.3.2, 1.3.4, and 1.3.7a.

*Custom Application.* AWS's Professional Services team works with some of the most security-obsessed customers to use advanced tools such as Tiros to achieve custom-tailored solutions. For example, as discussed in a public lecture [6], Bridgewater Associates worked with AWS Professional Services to build a Tirosbased solution which proves invariants of new AWS-based network designs before they are deployed in Bridgewater's AWS environment. Proof of these invariants assures the absence of possible data exfiltration paths that could be leveraged by an adversary.

#### **5 Conclusion**

We have described the first complete formalization of AWS networking semantics into logic. For customers of AWS services, Tiros provides deep insights into AWS networking. Via the incorporation of Tiros into the Amazon Inspector service, millions of AWS customers are able to automatically and continuously maintain their network-based security posture. They can now show compliance with security requirements at a scale that was impossible before. Internally within AWS, we are also able to automate some aspects of compliance evidence generation, which lowers our costs and increases our ability to quickly launch new features and services.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Distributed Systems and Networks

## **Verification of Threshold-Based Distributed Algorithms by Decomposition to Decidable Logics**

Idan Berkovits1(B), Marijana Lazi´c2,3, Giuliano Losa<sup>4</sup>, Oded Padon<sup>5</sup>, and Sharon Shoham<sup>1</sup>

> Tel Aviv University, Tel Aviv-Yafo, Israel berkovits@mail.tau.ac.il TU Wien, Vienna, Austria TU Munich, Munich, Germany University of California, Los Angeles, USA Stanford University, Stanford, USA

**Abstract.** Verification of fault-tolerant distributed protocols is an immensely difficult task. Often, in these protocols, *thresholds* on set cardinalities are used both in the process code and in its correctness proof, e.g., a process can perform an action only if it has received an acknowledgment from at least half of its peers. Verification of threshold-based protocols is extremely challenging as it involves two kinds of reasoning: first-order reasoning about the unbounded state of the protocol, together with reasoning about sets and cardinalities. In this work, we develop a new methodology for decomposing the verification task of such protocols into *two* decidable logics: EPR and BAPA. Our key insight is that such protocols use thresholds in a restricted way as a means to obtain certain properties of "intersection" between sets. We define a language for expressing such properties, and present two translations: to EPR and BAPA. The EPR translation allows verifying the protocol while assuming these properties, and the BAPA translation allows verifying the correctness of the properties. We further develop an algorithm for automatically generating the properties needed for verifying a given protocol, facilitating fully automated deductive verification. Using this technique we have verified several challenging protocols, including Byzantine one-step consensus, hybrid reliable broadcast and fast Byzantine Paxos.

#### **1 Introduction**

Fault-tolerant distributed protocols play an important role in the avionic and automotive industries, medical devices, cloud systems, blockchains, etc. Their unexpected behavior might put human lives at risk or cause a huge financial loss. Therefore, their correctness is of ultimate importance.

Ensuring correctness of distributed protocols is a notoriously difficult task, due to the unbounded number of processes and messages, as well as the nondeterministic behavior caused by the presence of faults, concurrency, and message delays. In general, the problem of verifying such protocols is undecidable. This imposes two directions for attacking the problem: (i) developing fullyautomatic verification techniques for *restricted* classes of protocols, or (ii) designing deductive techniques for a wide range of systems that *require user assistance*. Within the latter approach, recently emerging techniques [29] leverage decidable logics that are supported by mature automated solvers to significantly reduce user effort, and increase verification productivity. Such logics bring several key benefits: (i) their solvers usually enjoy stable performance, and (ii) whenever annotations provided by the user are incorrect, the automated solvers can provide a counterexample for the user to examine.

Deductive verification based on decidable logic requires a logical formalism that satisfies two conflicting criteria: the formalism should be expressive enough to capture the protocol, its correctness properties, its inductive invariants, and ultimately its verification conditions. At the same time, the formalism should be decidable and have an effective automated tool for checking verification conditions.

In this paper we develop a methodology for deductive verification of *threshold-based* distributed protocols using decidable logic, well-established decidable logics to settle the tension explained above.

In threshold-based protocols, a process may take different actions based on the number of processes from which it received certain messages. This is often used to achieve fault-tolerance. For example, a process may take a certain step once it has received an acknowledgment from a strict majority of its peers, that is, from more than **<sup>n</sup>**/2 processes, where **<sup>n</sup>** is the total number of processes. Such expressions as **<sup>n</sup>**/2, are called *thresholds*, and in general they can depend on additional parameters, such as the maximal number of crashed processes, or the maximal number of Byzantine processes.

Verification of such protocols requires two flavors of reasoning, as demonstrated by the following example. Consider the Paxos [20] protocol, in which each process proposes a value and all must agree on a common proposal. The protocol tolerates up to **t** process crashes, and ensures that every two processes that decide agree on the decided value. The protocol requires **<sup>n</sup>** > <sup>2</sup>**<sup>t</sup>** processes, and each process must obtain confirmation messages from **n**−**t** processes before making a decision. The protocol is correct due to, among others, the fact that if **<sup>n</sup>** > <sup>2</sup>**<sup>t</sup>** then any two sets of **<sup>n</sup>** <sup>−</sup> **<sup>t</sup>** processes have a process in common. To verify this protocol we need to express (i) relationships between an unbounded number of processes and values, which typically requires quantification over uninterpreted domains ("every two processes"), and (ii) properties of sets of certain cardinalities ("any two sets of **n** − **t** processes intersect"). Crucially, these two types of reasoning are intertwined, as the sets of processes for which we need to capture cardinalities may be defined by their relations with other state components ("messages from at least **n**−**t** processes"). While uninterpreted first-order logic (FOL) seems like the natural fit for the first type of reasoning, it is seemingly a poor fit for the second type, since it cannot express set cardinalities and the arithmetic used to define thresholds. Typically, logics that combine both types of reasoning are either undecidable or not flexible enough to capture protocols as intricate as the ones we consider.

The approach we present relies on the observation that threshold-based protocols and their correctness proofs use set cardinality thresholds in a restricted way as a means to obtain certain properties between sets, and that these properties can be expressed in FOL via a suitable encoding. In the example above, the important property is that every two sets of cardinality at least **n**−**t** have a non-empty intersection. This property can be encoded in FOL by modeling sets of cardinality at least **n**−**t** using an uninterpreted sort along with a membership relation between this sort and the sort for processes. However, the validity of the property under the assumption that **<sup>n</sup>** > <sup>2</sup>**<sup>t</sup>** cannot be verified in FOL.

The key idea of this paper is, hence, to decompose the verification problem of threshold-based protocols into the following problems: (i) Checking protocol correctness assuming certain intersection properties, which can be reduced to verification conditions expressed in the Effectively Propositional (EPR) fragment of FOL [25,35]. (ii) Checking that sets with cardinalities adhering to the thresholds satisfy the intersection properties (under the protocol assumptions), which can be reduced to validity checks in quantifier-free Boolean Algebra with Presburger Arithmetic (BAPA) [19]. Both BAPA and EPR are decidable logics, and are supported by mature solvers.

A crucial step in employing this decomposition is finding suitable intersection properties that are strong enough to imply the protocol's correctness (i.e., imply the FOL verification conditions), and are also implied by the precise definitions of the thresholds and the protocol's assumptions. Thus, these intersection properties can be viewed as *interpolants* between the FOL verification conditions and the thresholds in the context of the protocol's assumptions. We present fully automated procedures to find such intersection property interpolants, either eagerly or lazily.

The main contributions of this paper are<sup>1</sup>:


<sup>1</sup> An extended version of this paper, which includes additional details and proofs, appears in [3].

thresholds and the protocol's assumptions using arithmetic; verification is carried out automatically via decomposition to well-established decidable logics.

5. We implement the approach, leveraging mature existing solvers (Z3 and CVC4), and evaluate it by verifying several challenging threshold-based protocols with sophisticated thresholds and assumptions. Our evaluation shows the effectiveness and flexibility of our approach in modeling and verifying complex protocols, including the feasibility of automatically inferring threshold intersection properties.

## **2 Preliminaries**

**Transition Systems in FOL.** We model distributed protocols as transition systems expressed in many-sorted FOL. A state of the system is a first-order (FO) structure s = (D, <sup>I</sup>) over a vocabulary Σ that consists of sorted constant, function and relation symbols, s.t. s satisfies a finite set of *axioms* Θ in the form of closed formulas over Σ. <sup>D</sup> is the *domain* of s mapping each sort to a set of objects (elements), and I is the *interpretation function*. A FO *transition system* is a tuple (Σ, Θ, I, *TR*), where Σ and Θ are as above, I is a closed formula over Σ that defines the *initial states*, and *TR* is a closed formula over Σ - Σ that defines the *transition relation* where Σ describes the source state of a transition and Σ <sup>=</sup> {a <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>} describes the target state. We require that *TR* does not modify any symbol that appears in Θ. The set of reachable states is defined as usual. In practice, we define FO transition systems using a modeling language with a convenient syntax [29].

**Properties and Inductive Invariants.** A *safety property* is expressed by a closed FO formula P over Σ. The system is *safe* if all of its reachable states satisfy P. A closed FO formula *Inv* over Σ is an *inductive invariant* for a transition system (Σ, Θ, I, *TR*) and property P if the following formulas, called the *verification conditions*, are valid (equivalently, their negations are unsatisfiable): (i) Θ <sup>→</sup> (I <sup>→</sup> *Inv*), (ii) Θ <sup>→</sup> (*Inv* <sup>∧</sup> *TR* <sup>→</sup> *Inv* ) and (iii) Θ <sup>→</sup> (*Inv* <sup>→</sup> P), where *Inv* results from substituting every symbol in *Inv* by its primed version. We also use inductive invariants to verify arbitrary first-order LTL formulas via the reduction of [30,31].

**Effectively Propositional Logic (EPR).** The effectively-propositional (EPR) fragment of FOL is restricted to formulas without function symbols and with a quantifier prefix ∃<sup>∗</sup>∀<sup>∗</sup> in prenex normal form. Satisfiability of EPR formulas is decidable [25]. Moreover, EPR formulas enjoy the *finite model property*, i.e., ϕ is satisfiable iff it has a finite model. We consider a straightforward extension of EPR that maintains these properties and is supported by solvers such as Z3 [5]. The extension allows function symbols and quantifier alternations as long as the formula's *quantifier alternation graph*, denoted *QA*(ϕ), is acyclic. For ϕ in negation normal form, *QA*(ϕ) is a directed graph where the set of vertices is the set of sorts and the set of edges is defined as follows: every function symbol introduces edges from its arguments' sorts to its image's sort, and every existential quantifier <sup>∃</sup>x that resides in the scope of universal quantifiers introduces edges from the sorts of the universally quantified variables to the sort of x. The quantifier alternation graph is extended to sets of formulas as expected.

**Boolean Algebra with Presburger Arithmetic (BAPA).** Boolean Algebra with Presburger Arithmetic (BAPA) [19] is a FO theory defined over two sorts: int (for integers), and set (for subsets of a finite universe). The language is defined as follows:

$$F ::= B\_1 = B\_2 \mid L\_1 = L\_2 \mid L\_1 < L\_2 \mid F\_1 \land F\_2 \mid F\_1 \lor F\_2 \mid \neg F \mid \exists x.F \mid \forall x.F \mid \exists u.F \mid \forall u.F$$

$$B ::= x \mid \emptyset \mid \mathbf{a} \mid B\_1 \cup B\_2 \mid B\_1 \cap B\_2 \mid B^c \qquad L ::= u \mid K \mid \mathbf{n} \mid i \mid L\_1 + L\_2 \mid K \cdot L \mid \mid B \mid$$

where L defines linear integer terms, where u denotes an integer variable, k <sup>∈</sup> K defines an (interpreted) integer constant symbol ..., <sup>−</sup>2, <sup>−</sup>1, <sup>0</sup>, <sup>1</sup>, <sup>2</sup> ..., **<sup>n</sup>** is an integer constant symbol that represents the size of the finite set universe, i is an uninterpreted integer constant symbol (as opposed to the constant symbols from K), and <sup>|</sup>b<sup>|</sup> denotes set cardinality; B defines set terms, where x denotes a set variable, ∅ is a (interpreted) set constant symbol that represents the empty set, and **<sup>a</sup>** is an uninterpreted set constant symbol; and F defines the set of BAPA formulas, where <sup>1</sup> <sup>=</sup> <sup>2</sup> and <sup>1</sup> < <sup>2</sup> are atomic arithmetic formulas and <sup>b</sup><sup>1</sup> <sup>=</sup> <sup>b</sup><sup>2</sup> is an atomic set formula. (Other set constraints such as <sup>b</sup><sup>1</sup> <sup>⊆</sup> <sup>b</sup><sup>2</sup> can be encoded in the usual way). In the sequel, we also allow arithmetic terms of the form - k where k <sup>∈</sup> K is a positive integer and <sup>∈</sup> L, as any formula that contains such

terms can be translated to an equivalent BAPA formula by multiplying by k. A BAPA structure is <sup>s</sup><sup>B</sup> = (D, <sup>I</sup>) where the domain <sup>D</sup> maps sort int to the set of all integers and maps sort set to the set of all subsets of a finite universe U, called the *universal set*. The semantics of terms and formulas is as expected, where the interpretation of the complement operation is defined with respect to U (e.g., <sup>I</sup>(∅<sup>c</sup>) = <sup>U</sup>), and the integer constant **<sup>n</sup>** is interpreted to the size of <sup>U</sup>, i.e. <sup>I</sup>(**n**) = <sup>|</sup>U|.

Both validity and satisfiability of BAPA formulas (with arbitrary quantification) are decidable [19], and the quantifier-free fragment is supported by CVC4 [2].

#### **3 First-Order Modeling of Threshold-Based Protocols**

Next we explain our modeling of threshold-based protocols as transition systems in FOL (Note that FOL cannot directly express set cardinality constraints). The idea is to capture each threshold by a designated sort, such that elements of this sort represent sets of nodes that satisfy the threshold. Elements of the threshold sort are then used instead of the actual threshold in the description of the protocol and in the verification conditions. For verification to succeed, some properties of the sets satisfying the cardinality threshold must be captured in FOL. This is done by introducing additional assumptions (formally, axioms of the transition system) expressed in FOL, as discussed in Sect. 4.


**Fig. 1.** Bosco: a one-step asynchronous Byzantine consensus algorithm [39], and an excerpt RML (relational modeling language) code of the main transition. Note that we overload the *member* relation for all threshold sorts. The formula ∃!x. ϕ(x) is a shorthand for exists and unique.

**Running Example.** We illustrate our approach using the example of Bosco an asynchronous Byzantine fault-tolerant (BFT) consensus algorithm [39]. Its modeling in first-order logic using our technique appears alongside an informal pseudo-code in Fig. 1.

In the BFT consensus problem, each node proposes a value and correct nodes must decide on a unique proposal. BFT consensus algorithms typically require at least two communication rounds to reach a decision. In Bosco, nodes execute a preliminary communication step which, under favorable conditions, reaches an early decision, and then call an underlying BFT consensus algorithm to ensure reaching a decision even if these conditions are not met. Bosco is safe when **<sup>n</sup>** > <sup>3</sup>**t**; it guarantees that a preliminary decision will be reached if all nodes are non-faulty and propose the same value when **<sup>n</sup>** > <sup>5</sup>**<sup>t</sup>** (weakly one-step condition), and even if some nodes are faulty, as long as all non-faulty nodes propose the same value, when **<sup>n</sup>** > <sup>7</sup>**<sup>t</sup>** (strongly one-step condition).

Bosco achieves consensus by ensuring that (a) no two correct nodes decide differently in the preliminary step, and (b) if a correct node decides value v in the preliminary step then every correct process calls the underlying BFT consensus algorithm with proposal v. Property (a) is ensured by the fact that a node decides in the preliminary step only if more than **<sup>n</sup>**+3**<sup>t</sup>** <sup>2</sup> nodes proposed the same value. When **<sup>n</sup>** > <sup>3</sup>**t**, two sets of cardinality greater than **<sup>n</sup>**+3**<sup>t</sup>** <sup>2</sup> have at least one non-faulty node in common, and therefore no two different values can be proposed by more than **<sup>n</sup>**+3**<sup>t</sup>** <sup>2</sup> nodes. Similarly, we can derive property (b) from the fact that a set of more than **<sup>n</sup>**+3**<sup>t</sup>** <sup>2</sup> nodes and a set of **n** − **t** nodes intersect in **<sup>n</sup>**+**<sup>t</sup>** <sup>2</sup> nodes, which, after removing <sup>t</sup> nodes which may be faulty, still leaves us with more than **<sup>n</sup>**−**<sup>t</sup>** <sup>2</sup> nodes, satisfying the condition in line 9.

#### **3.1 Threshold-Based Protocols**

**Parameters and Resilience Conditions.** We consider protocols whose definitions depend on a set of *parameters*, *Prm*, divided into *integer parameters*, *Prm*<sup>I</sup> , and *set parameters*, *Prm*S. *Prm*<sup>I</sup> always includes **n**, the total number of nodes (assumed to be finite). Protocol correctness is ensured under a set of assumptions Γ called *resilience conditions*, formulated as BAPA formulas over *Prm* (this means that all the uninterpreted constants appearing in Γ are from *Prm*). In Bosco, *Prm*<sup>I</sup> <sup>=</sup> {**n**, **<sup>t</sup>**}, where **<sup>t</sup>** is the maximal number of Byzantine failures tolerated by the algorithm, and *Prm*<sup>S</sup> = {**f**}, where **f** is the set of Byzantine nodes; Γ <sup>=</sup> {**<sup>n</sup>** <sup>≥</sup> <sup>3</sup>**<sup>t</sup>** + 1, <sup>|</sup>**f**| ≤ **<sup>t</sup>**}.

**Threshold Conditions.** Both the description of the protocol and the inductive invariant may include conditions that require the size of some set of nodes to be "at least t", "at most t", and so on, where the threshold t is of the form t <sup>=</sup> - k , where k is a positive integer, and is a ground BAPA integer term over *Prm* (we do not allow comparing sizes of two sets – we observe that it is not needed for threshold-based protocols). We denote the set of thresholds by T. For example, in Bosco, T <sup>=</sup> {**<sup>n</sup>** <sup>−</sup> **<sup>t</sup>**, **<sup>n</sup>**+3**t**+1 <sup>2</sup> , **<sup>n</sup>**−**t**+1 <sup>2</sup> }.

Wlog we assume that all conditions on set cardinalities are of the form "at least t" since every condition can be written this way, possibly by introducing new thresholds:

$$|X| > \frac{\ell}{k} \equiv |X| \ge \frac{\ell+1}{k} \qquad |X| \le \frac{\ell}{k} \equiv |X^c| \ge \frac{k \cdot \mathbf{n} - \ell}{k} \qquad |X| < \frac{\ell}{k} \equiv |X| \le \frac{\ell-1}{k}$$

#### **3.2 Modeling in FOL**

**FO Vocabulary for Modeling Threshold-Based Protocols.** We describe the protocol's states (e.g., pending messages, votes, etc.) using a core FO vocabulary <sup>Σ</sup><sup>C</sup> that includes sort node and additional sorts and symbols. Parameters *Prm* are *not* part of the FO vocabulary used to model the protocol. Also, we do not model set cardinality directly. Instead, we encode the cardinality thresholds in FOL by defining a FO vocabulary Σ*Prm* <sup>T</sup> :


We then model the protocol as a transition system (Σ, Θ, I, *TR*) where Σ <sup>=</sup> Σ*Prm* <sup>T</sup> .

<sup>Σ</sup><sup>C</sup> -We are interested only in states (FO structures over Σ) where the interpretation of the threshold sorts and membership relations is according to their intended meaning in a corresponding BAPA structure. Formally, these are Textensions, defined as follows:

**Definition 1.** *We say that a FO structure* <sup>s</sup><sup>C</sup> = (D<sup>C</sup> , <sup>I</sup><sup>C</sup> ) *over* <sup>Σ</sup><sup>C</sup> *and a BAPA structure* <sup>s</sup><sup>B</sup> = (D<sup>B</sup>, <sup>I</sup>B) *over Prm are* compatible *if* <sup>D</sup>B(set) = <sup>P</sup>(D<sup>C</sup> (node))*, where* P *is the powerset operator. For such compatible structures and a set of thresholds* <sup>T</sup> *over Prm, the* <sup>T</sup>-extension *of* <sup>s</sup><sup>C</sup> *by* <sup>s</sup><sup>B</sup> *is the structure* <sup>s</sup> = (D, <sup>I</sup>) *over* Σ *defined as follows:*

D(s) = D*<sup>C</sup>* (s) *for every sort* s *in* Σ*<sup>C</sup>* I(a) = I*<sup>C</sup>* (a) *for every* a *in* Σ*<sup>C</sup>* <sup>D</sup>(sett) = {<sup>A</sup> ⊆ D*<sup>C</sup>* (node) | |A|≥I*B*(t)} I(*member***a**) = <sup>I</sup>*B*(**a**) I(*membert*) = {(e, A) | e ∈ D*<sup>C</sup>* (node), A ∈ D(sett), e ∈ A}

Note that for the T-extension s to be well defined as a FO structure, we must have that <sup>D</sup>(sett) <sup>=</sup> <sup>∅</sup> for every threshold <sup>t</sup> <sup>∈</sup> <sup>T</sup>. This means that a <sup>T</sup>-extension by <sup>s</sup><sup>B</sup> only exists if {<sup>A</sup> ⊆ D(node) | |A|≥IB(t)} <sup>=</sup> <sup>∅</sup>. This is ensured by the following condition:

**Definition 2 (Feasibility).** T *is* Γ-feasible *if* Γ <sup>|</sup><sup>=</sup> t <sup>≤</sup> **<sup>n</sup>** *for every* t <sup>∈</sup> T*.*

**Expressing Threshold Constraints.** Cardinality constraints can be expressed in FOL over the vocabulary <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup><sup>C</sup> - Σ*Prm* <sup>T</sup> using quantification. To express that |{n : node <sup>|</sup> ϕ(n, u¯)}| ≥ t, i.e., that there are at least t nodes that satisfy the FO formula <sup>ϕ</sup> over <sup>Σ</sup><sup>C</sup> (where ¯<sup>u</sup> are free variables in <sup>ϕ</sup>), we use the following first-order formula over Σ: <sup>∃</sup>q : set<sup>t</sup>. <sup>∀</sup><sup>n</sup> : node. *member*t(n, q) <sup>→</sup> <sup>ϕ</sup>(n, <sup>u</sup>¯). Similarly, to express the property that a node is a member of a set parameter **a** (e.g., to check if n <sup>∈</sup> **<sup>f</sup>**, i.e., a node is faulty) we use the FO formula *member***a**(n). For example, in Fig. 1, line 5 (right) uses the FO modeling to express the condition in line 5 (left). This modeling is sound in the following sense:

**Lemma 1 (Soundness).** *Let* <sup>s</sup><sup>C</sup> = (D<sup>C</sup> , <sup>I</sup><sup>C</sup> ) *be a FO structure over* <sup>Σ</sup><sup>C</sup> *,* <sup>s</sup><sup>B</sup> = (D<sup>B</sup>, <sup>I</sup>B) *a compatible BAPA structure over Prm s.t.* <sup>s</sup><sup>B</sup> <sup>|</sup><sup>=</sup> <sup>Γ</sup> *and* <sup>T</sup> *<sup>a</sup>* Γ*-feasible set of thresholds over Prm. Then there exists a (unique)* T*-extension* <sup>s</sup> *of* <sup>s</sup><sup>C</sup> *by* <sup>s</sup><sup>B</sup>*. Further:*


**Definition 3.** *A first-order structure* s *over* Σ *is* threshold-faithful *if it is a* <sup>T</sup>*-extension of some* <sup>s</sup><sup>C</sup> *by some* <sup>s</sup><sup>B</sup> <sup>|</sup><sup>=</sup> <sup>Γ</sup> *(as in Lemma 1).*

**Incompleteness.** Lemma 1 ensures that the FO modeling can be soundly used to verify the protocol. It also ensures that the modeling is precise on thresholdfaithful structures (Def. 1). Yet, the FO transition system is not restricted to such states, hence it *abstracts* the actual protocol. To have any hope to verify the protocol, we must capture *some* of the intended meaning of the threshold sorts and relations. This is obtained by adding FO axioms to the FO transition system. Soundness is maintained as long as the axioms hold in all thresholdfaithful structures. We note that the set of *all* such axioms is not recursively enumerable– this is where the essential incompleteness of our approach lies.

#### **4 Decomposition via Threshold Intersection Properties**

In this section, we identify a set of properties we call *threshold intersection properties*. When captured via FO axioms, these properties suffice for verifying many threshold-based protocols (all the ones we considered). Importantly, these are properties of sets adhering to the thresholds that do not involve the protocol state. As a result, they can be expressed both in FOL and in BAPA. This allows us to decompose the verification task into: (i) checking that certain threshold properties are valid in all threshold-faithful structures by checking that they are implied by Γ (carried out using quantifier free BAPA), and (ii) checking that the verification conditions of the FO transition-system with the same threshold properties taken as axioms are valid (carried out in first-order logic, and in EPR if quantifier alternations are acyclic).

#### **4.1 Threshold Intersection Property Language**

Threshold properties are expressed in the *threshold intersection property language* (TIP). TIP is essentially a subset of BAPA, specialized to have the properties listed above.

**Syntax.** We define TIP as follows, with t <sup>∈</sup> T a threshold (of the form - <sup>k</sup> ) and **a** ∈ *Prm*S:

$$\begin{aligned} F &::= B \neq \emptyset \mid B^c = \emptyset \mid g\_{\geq t}(B) \mid F\_1 \land F\_2 \mid \forall x : g\_{\geq t} F \\ B &::= \mathbf{a} \mid \mathbf{a}^c \mid x \mid x^c \mid \emptyset \mid \emptyset^c \mid B\_1 \cap B\_2 \end{aligned}$$

TIP restricts the use of set cardinality to *threshold guards* g<sup>≥</sup><sup>t</sup>(b) with the meaning <sup>|</sup>b| ≥ t. No other arithmetic atomic formulas are allowed. Comparison atomic formulas are restricted to b <sup>=</sup> <sup>∅</sup> and b<sup>c</sup> <sup>=</sup> <sup>∅</sup>. Quantifiers must be guarded, and negation, disjunction and existential quantification are excluded. We forbid set union and restrict complementation to atomic set terms. We refer to such formulas as *intersection properties* since they express properties of intersections of (atomic) sets.

*Example 1.* In Bosco, the following property captures the fact that the intersection of a set of at least **<sup>n</sup>** <sup>−</sup> **<sup>t</sup>** nodes and a set of more than **<sup>n</sup>**+3**<sup>t</sup>** <sup>2</sup> nodes consists of at least **<sup>n</sup>**−**<sup>t</sup>** <sup>2</sup> non-faulty nodes. This is needed for establishing correctness of the protocol.

$$\forall x: g\_{\geq \mathbf{n}-\mathbf{t}} . \forall y: g\_{\geq \frac{\mathbf{n}+3\mathbf{t}+1}{2}} . \ g\_{\geq \frac{\mathbf{n}-\mathbf{t}+1}{2}}(x \cap y \cap \mathbf{f}^c)$$

**Semantics.** As TIP is essentially a subset of BAPA, we define its semantics by translating its formulas to BAPA, where most constructs directly correspond to BAPA constructs, and guards are translated to cardinality constraints:

$$\mathcal{B}(g\_{\geq \frac{\ell}{k}}(b)) \stackrel{\text{def}}{=} k \cdot |b| \geq \ell \qquad \mathcal{B}(\forall x : g. \,\,\varphi) \stackrel{\text{def}}{=} \forall x. \,\,\neg \mathcal{B}(g(x)) \lor \mathcal{B}(\varphi)$$

The notions of structures, satisfaction, equivalence, validity, satisfiability, etc. are inherited from BAPA. In particular, given a set of BAPA resilience conditions Γ over the parameters *Prm*, we say that a TIP formula ϕ is Γ-valid, denoted Γ <sup>|</sup><sup>=</sup> ϕ, if Γ <sup>|</sup><sup>=</sup> <sup>B</sup>(ϕ).

If Γ is quantifier-free (which is the typical case), Γ-validity of TIP formulas can be checked via validity checks of quantifier-free BAPA formulas, supported by mature solvers. Note that Γ-validity of a formula of the form <sup>∀</sup><sup>x</sup> : <sup>g</sup><sup>≥</sup>t<sup>1</sup> . <sup>|</sup><sup>x</sup> <sup>∩</sup> <sup>b</sup>| ≥ <sup>t</sup><sup>2</sup> is equivalent to <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>∀</sup>u. u <sup>≥</sup> <sup>t</sup><sup>1</sup> <sup>→</sup> <sup>u</sup> <sup>+</sup> <sup>|</sup>b| − <sup>n</sup> <sup>≥</sup> <sup>t</sup><sup>2</sup>, allowing replacing quantification over sets by quantification over integers, thus improving performance of existing solvers.

#### **4.2 Translation to FOL**

To verify threshold-based protocols, we translate TIP formulas to FO axioms, using the threshold sorts and relations. To translate <sup>g</sup><sup>≥</sup><sup>t</sup>(b), we follow the principle in (Sect. 3.2):

FO(¬ϕ) = ¬FO(ϕ) FO(<sup>n</sup> <sup>∈</sup> <sup>b</sup>c) = ¬FO(<sup>n</sup> <sup>∈</sup> <sup>b</sup>) FO(ϕ<sup>1</sup> ∧ ϕ2) = FO(ϕ1) ∧ FO(ϕ2) FO(n ∈ ∅) = *false* FO(∀ x : g. ϕ) = ∀ x : set<sup>g</sup> .FO(ϕ) FO(n ∈ **a**) = *member***a**(n) FO(n ∈ b<sup>1</sup> ∩ b2) = FO(n ∈ b1) ∧ FO(n ∈ b2) FO(n ∈ x) = *member*t(n, x) FO(b = ∅) = ∃n : node. FO(n ∈ b) where x is guarded by t FO(b<sup>c</sup> <sup>=</sup> <sup>∅</sup>) = <sup>∀</sup><sup>n</sup> : node. FO(<sup>n</sup> <sup>∈</sup> <sup>b</sup>) FO(g≥t(b)) = ∃x : sett. ∀n : node. *member*t(n, x) → FO(n ∈ b)

We lift FO to sets of formulas: FO(Δ) = {FO(ϕ) <sup>|</sup> ϕ <sup>∈</sup> Δ}.

Next, we state the soundness of the translation, which intuitively means that FO(ϕ) is "equivalent" to ϕ over threshold-faithful FO structures (Definition 1). This justifies adding FO(ϕ) as a FO axiom whenever ϕ is Γ-valid.

**Theorem 1 (Translation soundness).** *Let* <sup>s</sup><sup>C</sup> = (D<sup>C</sup> , <sup>I</sup><sup>C</sup> ) *be a first-order structure over* <sup>Σ</sup><sup>C</sup> *,* <sup>s</sup><sup>B</sup> = (D<sup>B</sup>, <sup>I</sup>B) *a compatible BAPA structure over Prm, and* <sup>s</sup> *the* <sup>T</sup>*-extension of* <sup>s</sup><sup>C</sup> *by* <sup>s</sup><sup>B</sup>*. Then for every closed TIP formula* <sup>ϕ</sup>*, we have* <sup>s</sup><sup>B</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> <sup>⇔</sup> <sup>s</sup> <sup>|</sup><sup>=</sup> FO(ϕ).

**Corollary 1.** *For every closed TIP formula* ϕ *such that* Γ <sup>|</sup><sup>=</sup> ϕ*, we have that* FO(ϕ) *is satisfied by every threshold-faithful first-order structure.*

#### **5 Automatically Inferring Threshold Intersection Properties**

To apply the approach described in Sects. 3 and 4, it is crucial to find suitable threshold properties. That is, given the resilience conditions Γ and a FO transition system modeling the protocol, we need to find a set Δ of TIP formulas such that (i) Γ <sup>|</sup><sup>=</sup> ϕ for every ϕ <sup>∈</sup> Δ, and (ii) the VCs of the transition system with the axioms FO(Δ) are valid.

In this section, we address the problem of automatically inferring such a set Δ. In particular, we prove that for any protocol that satisfies a natural condition, there are finitely many Γ-valid TIP formulas (up to equivalence), enabling a complete automatic inference algorithm. Furthermore, we show that (under certain reasonable conditions formalized in this section), the FO axioms resulting from the inferred TIP properties have an *acyclic* quantifier alternation graph, facilitating protocol verification in EPR.

**Notation.** For the rest of this section, we fix a set *Prm* of parameters, a set Γ of resilience conditions over *Prm*, and a set T of thresholds. Note that b <sup>=</sup> ∅ ≡ g<sup>≥</sup><sup>1</sup>(b) and b<sup>c</sup> <sup>=</sup> ∅ ≡ g<sup>≥</sup>**<sup>n</sup>**(b). Therefore, for uniformity of the presentation, given a set T of thresholds, we define T<sup>ˆ</sup> def <sup>=</sup> T ∪ {1, **<sup>n</sup>**} and replace atomic formulas of the form b <sup>=</sup> <sup>∅</sup> and b<sup>c</sup> <sup>=</sup> <sup>∅</sup> by the corresponding guard formulas. As such, the only atomic formulas are of the form g<sup>≥</sup><sup>t</sup>(b) where <sup>t</sup> <sup>∈</sup> <sup>T</sup>ˆ. Note that guards in quantifiers are still restricted to <sup>g</sup><sup>≥</sup><sup>t</sup> where <sup>t</sup> <sup>∈</sup> <sup>T</sup>. Given a set *Prm*S, we also denote *Prm*<sup>ˆ</sup> <sup>S</sup> <sup>=</sup> *Prm*<sup>S</sup> ∪ {**a**<sup>c</sup> <sup>|</sup> **<sup>a</sup>** <sup>∈</sup> *Prm*S}.

#### **5.1 Finding Consequences in the Threshold Intersection Property Language**

In this section, we present Aip– an algorithm for inferring all Γ-valid TIP formulas. A na¨ıve (non-terminating) algorithm would iteratively check Γ-validity of every TIP formula. Instead, Aip prunes the search space relying on the following condition:

**Definition 4.** T *is* Γ-non-degenerate *if for every* t <sup>∈</sup> T *it holds that* Γ <sup>|</sup><sup>=</sup> t <sup>≤</sup> <sup>0</sup>*.*

If <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>t</sup> <sup>≤</sup> 0 then <sup>t</sup> is degenerate in the sense that <sup>g</sup><sup>≥</sup><sup>t</sup>(b) is always Γ-valid, and <sup>∀</sup>x : g<sup>≥</sup><sup>t</sup>. g<sup>≥</sup>t- (x <sup>∩</sup> b) is never Γ-valid unless t is also degenerate.

We observe that we can (i) push conjunctions outside of formulas (since ∀ distributes over <sup>∧</sup>), and assuming non-degeneracy, (ii) ignore terms of the form x<sup>c</sup>:

**Lemma 2.** *If* T *is* Γ*-feasible and* Γ*-non-degenerate, then for every* Γ*-valid* ϕ *in TIP, there exist* <sup>ϕ</sup><sup>1</sup>,...,ϕ<sup>m</sup> *s.t.* <sup>ϕ</sup> <sup>≡</sup> m <sup>i</sup>=1 <sup>ϕ</sup><sup>i</sup> *and for every* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>*,* <sup>ϕ</sup><sup>i</sup> *is of the form:*

<sup>∀</sup>x<sup>1</sup> : <sup>g</sup>≥t<sup>1</sup> ... <sup>∀</sup>x<sup>q</sup> : <sup>g</sup>≥t<sup>q</sup> . g≥t(x<sup>1</sup> <sup>∩</sup> ... <sup>∩</sup> <sup>x</sup><sup>q</sup> <sup>∩</sup> <sup>a</sup><sup>1</sup> ... <sup>∩</sup> <sup>a</sup>k)

*where* <sup>q</sup> <sup>+</sup> k > <sup>0</sup>*,* <sup>t</sup><sup>1</sup>,...,t<sup>q</sup> <sup>∈</sup> <sup>T</sup>*,* <sup>t</sup> <sup>∈</sup> <sup>T</sup>ˆ*,* <sup>a</sup><sup>1</sup>,...,a<sup>k</sup> <sup>∈</sup> *Prm*<sup>ˆ</sup> <sup>S</sup>*, and the* <sup>a</sup>i*'s are distinct.*

We refer to <sup>ϕ</sup><sup>i</sup> of the form above as *simple*, and refer to <sup>g</sup><sup>≥</sup><sup>t</sup> as its *atomic guard*.

By Lemma 2, it suffices to generate all *simple* Γ-valid formulas. Next, we show that this can be done more efficiently by pruning the search space based on a subsumption relation that is checked syntactically avoiding Γ-validity checks.

**Definition 5 (Subsumption).** *For every* <sup>h</sup>1, h<sup>2</sup> <sup>∈</sup> <sup>T</sup>ˆ∪*Prm*<sup>ˆ</sup> <sup>S</sup>*, we denote* <sup>h</sup><sup>1</sup> <sup>Γ</sup> <sup>h</sup><sup>2</sup> *if one of the following holds: (1)* <sup>h</sup><sup>1</sup> <sup>=</sup> <sup>h</sup>2*, or (2)* <sup>h</sup>1, h<sup>2</sup> <sup>∈</sup> <sup>T</sup><sup>ˆ</sup> *and* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>h</sup><sup>1</sup> <sup>≥</sup> <sup>h</sup>2*, or (3)* <sup>h</sup><sup>1</sup> <sup>∈</sup> *Prm*<sup>ˆ</sup> <sup>S</sup>*,* <sup>h</sup><sup>2</sup> <sup>∈</sup> <sup>T</sup><sup>ˆ</sup> *and* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>|</sup>h<sup>1</sup>| ≥ <sup>h</sup>2*.*

For <sup>h</sup>1, h<sup>2</sup> <sup>∈</sup> <sup>T</sup><sup>ˆ</sup> and <sup>h</sup><sup>3</sup> <sup>∈</sup> *Prm*<sup>ˆ</sup> <sup>S</sup>, <sup>h</sup><sup>1</sup> <sup>Γ</sup> <sup>h</sup><sup>2</sup> means that <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>∀</sup><sup>x</sup> : <sup>g</sup><sup>≥</sup>h<sup>1</sup> . g<sup>≥</sup>h<sup>2</sup> (x), and <sup>h</sup><sup>3</sup> <sup>Γ</sup> <sup>h</sup><sup>2</sup> means that <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>g</sup><sup>≥</sup>h<sup>2</sup> (h<sup>3</sup>). We lift the relation <sup>Γ</sup> to act on simple formulas:

**Definition 6.** *Given simple formulas*

$$\begin{aligned} \alpha &= \forall x\_1 : g\_{\geq h\_1} \dots \forall x\_q : g\_{\geq h\_q} \cdot g\_{\geq t}(x\_1 \cap \dots \cap x\_q \cap h\_{q+1} \dots \cap h\_k) \\ \beta &= \forall x\_1 : g\_{\geq h\_1'} \dots \forall x\_{q'} : g\_{\geq h\_{q'}'} \cdot g\_{\geq t'}(x\_1 \cap \dots \cap x\_{q'} \cap h\_{q'+1}' \dots \cap h\_{k'}') \end{aligned}$$

*we say that* <sup>α</sup> <sup>Γ</sup> <sup>β</sup> *if (i)* <sup>t</sup> <sup>Γ</sup> <sup>t</sup> *, and (ii) there exists an injective function* f : {1,...,k }→{1,...,k} *s.t. for any* <sup>1</sup> <sup>≤</sup> i <sup>≤</sup> k *it holds that* h <sup>i</sup> <sup>Γ</sup> <sup>h</sup><sup>f</sup>(i)*.*

**Lemma 3.** *Let* α, β *be simple formulas such that* <sup>α</sup> <sup>Γ</sup> <sup>β</sup>*. If* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>α</sup> *then* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>β</sup>*.*

**Corollary 2.** *If no simple formula with* q *quantifiers is* Γ*-valid then no simple formula with more than* q *quantifiers is* Γ*-valid.*

Algorithm <sup>1</sup> depicts Aip that generates all Γ-valid simple formulas, relying on Lemma 3. Aip uses a na¨ıve search strategy; different strategies can be used (e.g. [26]). Based on Corollary 2, Aip terminates if for some number of quantifiers no Γ-valid formula is discovered.


**Lemma 4 (Soundness).** *Every formula* ϕ *that is returned by the algorithm is* Γ*-valid.*

**Lemma 5 (Completeness).** *If* T *is* Γ*-feasible and* Γ*-non-degenerate, then for every* <sup>Γ</sup>*-valid TIP formula* <sup>ϕ</sup> *there exist* <sup>ϕ</sup><sup>1</sup> ...ϕ<sup>m</sup> *s.t.* <sup>ϕ</sup> <sup>≡</sup> m <sup>i</sup>=1 <sup>ϕ</sup><sup>i</sup> *and* Aip *yields every* ϕi*.*

Next, we characterize the cases in which there are finitely many Γ-valid TIP formulas, up to equivalence, and thus, Aip is guaranteed to terminate.

**Definition 7.** <sup>T</sup> *is* <sup>Γ</sup>-sane *if for every* <sup>t</sup>1, t<sup>2</sup> <sup>∈</sup> <sup>T</sup>*,* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>t</sup><sup>1</sup> <sup>≤</sup> <sup>0</sup> <sup>∨</sup> <sup>t</sup><sup>2</sup> <sup>&</sup>gt; **<sup>n</sup>** <sup>−</sup> <sup>1</sup>*.* (T,*Prm*S) *is* <sup>Γ</sup>-sane *if, in addition, for every* <sup>t</sup><sup>1</sup> <sup>∈</sup> <sup>T</sup>*,* <sup>a</sup> <sup>∈</sup> *Prm*<sup>ˆ</sup> <sup>S</sup>*,* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>t</sup><sup>1</sup> <sup>≤</sup> <sup>0</sup> ∨ |a<sup>|</sup> <sup>=</sup> **<sup>n</sup>***.*

**Theorem 2.** *Assume that* T *is* Γ*-feasible. Then the following conditions are equivalent: (1) There are finitely many* Γ*-valid simple formulas. (2) There are finitely many* Γ*-valid TIP formulas, up to equivalence. (3)* T *is* Γ-sane*.*

**Corollary 3 (Termination).** *If* T *is* Γ*-feasible and* Γ*-sane,* Aip *terminates.*

#### **5.2 From TIP to Axioms in EPR**

The set of simple formulas generated by Aip, Δ, is translated to FOL axioms as described in Sect. 4.2. Next, we show how to ensure that the quantifier alternation graph (Sect. 2) of FO(Δ) is acyclic. A simple formula induces quantifier alternation edges in *QA*(FO(ϕ)) from the sorts of its universal quantifiers to the sort of its atomic guard <sup>g</sup><sup>≥</sup><sup>t</sup> (or if <sup>t</sup> = 1 to the node sort). Therefore, from Lemma 3, for a Γ-valid ϕ, cycles in *QA*(FO(ϕ)) may only occur if they occur in the graph obtained by <sup>Γ</sup> . Furthermore, if *QA*(FO(ϕ)) is not acyclic, then the atomic guard must be equal to one of the quantifier guards. We refer to such a formula as a *cyclic formula*. We show that, under the following assumption, we can eliminate all cyclic formulas from Δ.

**Definition 8.** <sup>T</sup> *is* <sup>Γ</sup>-acyclic *if for every* <sup>t</sup><sup>1</sup>, t<sup>2</sup> <sup>∈</sup> <sup>T</sup>*, if* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> *then* <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup>*.*

Intuitively, if T is not Γ-acyclic, then it has (at least) two "equivalent" thresholds, making one of them redundant. If that is the case, we can alter the protocol and its proof so that one of these guards is eliminated and the other one is used instead.

**Theorem 3.** *Let* T *be* Γ*-feasible and* Γ*-acyclic and* (T,*Prm*S) *be* <sup>Γ</sup>*-sane. Let* Δ *be the set returned by* Aip*, and* Δ <sup>=</sup> {<sup>ϕ</sup> <sup>∈</sup> <sup>Δ</sup> <sup>|</sup> <sup>ϕ</sup> *is acyclic*}*. Then the VCs of the FO transition system with axioms* FO(Δ) *are valid iff they are valid with axioms* FO(Δ )*. Further, QA*(FO(Δ )) *is acyclic.*

#### **5.3 Finding Minimal Properties Required for a Protocol**

If Δ consists of *all* acyclic Γ-valid TIP formulas returned by Aip, using FO(Δ) as FO axioms leads to divergence of the verifier. To overcome this, we propose two variants.

**Minimal Equivalent.** Δmin. Some of the formulas in FO(Δ) are implied by others, making them redundant. We remove such formulas using a greedy procedure that for every <sup>ϕ</sup><sup>i</sup> <sup>∈</sup> <sup>Δ</sup>, checks whether FO(<sup>Δ</sup> \ {ϕi}) <sup>|</sup><sup>=</sup> FO(ϕi), and if so, removes <sup>ϕ</sup><sup>i</sup> from <sup>Δ</sup>. Note that if *QA*(FO(Δ)) is acyclic, the check translates to (un)satisfiability in EPR.

This procedure results in <sup>Δ</sup>min <sup>⊆</sup> <sup>Δ</sup> s.t. FO(Δmin) <sup>|</sup><sup>=</sup> FO(Δ) and no strict subset of <sup>Δ</sup>min satisfies this condition. That is, <sup>Δ</sup>min is a local minimum for that property.

**Interpolant.** <sup>Δ</sup>int. There may exist <sup>Δ</sup>int <sup>⊆</sup> <sup>Δ</sup> s.t. FO(Δint) <sup>|</sup><sup>=</sup> FO(Δ) but FO(Δint) suffices to prove the first-order VCs, and enables to discharge the VCs more efficiently. We compute such a set <sup>Δ</sup>int iteratively. Initially, <sup>Δ</sup>int <sup>=</sup> <sup>∅</sup>. In each iteration, we check the VCs. If a counterexample to induction (CTI) is found, we add to <sup>Δ</sup>int a formula from <sup>Δ</sup> not satisfied by the CTI. In this approach, Δ is not pre-computed. Instead, Aip is invoked lazily to generate candidate formulas in reaction to CTIs.

#### **6 Evaluation**

We evaluate the approach by verifying several challenging threshold-based distributed protocols that use sophisticated thresholds: we verify the safety of Bosco [39] (presented in Sect. 3) under its 3 different resilience conditions, the safety and liveness (using the liveness to safety reduction presented in [30]) of Hybrid Reliable Broadcast [40], and the safety of Byzantine Fast Paxos [23]. Hybrid Reliable Broadcast tolerates four different types of faults, while Fast Byzantine Paxos is a fast-learning [21,22] Byzantine fault-tolerant consensus protocol; fast-learning protocols are notorious because two such algorithms, Zyzzyva [17] and FaB [28], were recently revealed incorrect [1] despite having been published at major systems conferences.

**Implementation.** We implemented both algorithms described in Sect. 5.3. AipEager eagerly constructs <sup>Δ</sup> by running Aip, and then uses EPR reasoning to remove redundant formulas (whose FO representation is implied by the FO representation of others). To reduce the number of EPR validity checks used during this minimization step, we implemented an optimization that allows us to prove redundancy of TIP formulas internally based on an extension of the notion of subsumption from Sect. 5. AipLazy computes a subset of <sup>Δ</sup> while using Aip in a lazy fashion, guided by CTIs obtained from attempting to verify the FO transition system. Our implementations use CVC4 to discharge BAPA queries, and Z3 to discharge EPR queries. Verification of first-order transition systems is performed using Ivy, which internally uses Z3 as well. All experiments reported were performed on a laptop running 64-bit Windows 10, with a Core-i5 2.2 GHz CPU, using Z3 version 4.8.4, CVC4 version 1.7, and the latest version of Ivy.

Figure 2 lists the protocols we verified and the details of the evaluation. Each experiment was repeated 10 times, and we report the mean time (μ) and standard


**Fig. 2.** Protocols verified using our technique. For each protocol, T is the set of thresholds and Γ is the resilience condition. AipEager lists metrics for the procedure of finding all Γ-valid TIP formulas (taking time **tC**), and verifying the transition system using the resulting properties (taking time **tv**). Obtaining a minimal subset that FO-implies the rest takes negligible time, so we did not include it in the table. The properties are given in Δ**Protocol Eager** , where g*i* denotes g≥*t*i . In addition to the run times, **V** shows *c v* , where c is the number ofΓ-valid simple formulas that were checked using the BAPA solver (CVC4), and v is the total number of Γ-valid simple formulas. Namely,v− c simple formulas were inferred to be valid via subsumption. **I** reports the analogous metric for Γ-invalid simple formulas. Finally, **Q** reports the maximal number of quantifiers considered (for which all formulas were Γ-invalid). AipLazy lists metrics for the procedure offinding a set of Γ-valid TIP formulas sufficient to prove the protocol based on counterexamples. The resulting set is listed in Δ**Protocol Lazy** and **t<sup>I</sup>** lists the total Ivy runtime, with the standard deviation specified below. **V** (resp. **I**) lists the number of Γ-valid (resp. Γ-invalid) simple formulas considered before the final set was reached. **CTI** lists the number of counterexample iterations required, and **Q** lists the maximal number of quantifiers of any TIP formula considered. Finally, **tv** lists the time required to verify the first-order transition system assuming the obtained set of properties. T.O. indicates that a time out of 1 h was reached.

,

deviation (σ). The figure's caption explains the presented information, and we discuss the results below.

**Aip**Eager For all protocols, running Aip took less than 1 min (column **tC**), and generated all Γ-valid simple TIP formulas. We observe that for most formulas, (in)validity is deduced from other formulas by subsumption, and less than 2%–5% of the formulas are actually checked using a BAPA query. With the optimization of the redundancy check, minimization of the set is performed in negligible time. The resulting set, ΔEager, contains 3–5 formulas, compared to 39–79 before minimization.

Due to the optimization described in Sect. 4 for the BAPA validity queries, the number of quantifiers in the TIP formulas that are checked by Aip does not affect the time needed to compute the full Δ. For example, Bosco under the Strongly One-step resilience condition contains Γ-valid simple TIP formulas with up to 7 quantifiers (as **<sup>n</sup>** <sup>&</sup>gt; <sup>7</sup>**<sup>t</sup>** and <sup>t</sup><sup>1</sup> <sup>=</sup> **<sup>n</sup>** <sup>−</sup> **<sup>t</sup>**), but Aip does not take significantly longer to find Δ. Interestingly, in this example the Γ-valid TIP formulas with more than 3 quantifiers are implied (in FOL) by formulas with at most 3 quantifiers, as indicated by the fact that these are the only formulas that remain in ΔBosco Strongly One-step Eager .

**Aip**Lazy With the lazy approach based on CTIs, the time for finding the set of TIP formulas, ΔLazy, is generally longer. This is because the run time is dominated by calls to Ivy with FO axioms that are too weak for verifying the protocol. However, the resulting <sup>Δ</sup>Lazy has a significant benefit: it lets Ivy prove the protocol much faster compared to using <sup>Δ</sup>Eager. Comparing **<sup>t</sup><sup>V</sup>** in AipEager vs. AipLazy shows that when the former takes a minute, the latter takes a few seconds, and when the former times out after 1 h, the latter terminates, usually in under 1 min. Comparing the formulas of <sup>Δ</sup>Eager and <sup>Δ</sup>Lazy reveals the reason. While the FO translation of both yields EPR formulas, the formulas resulting from <sup>Δ</sup>Eager contain more quantifiers and generate much more ground terms, which degrades the performance of Z3.

Another advantage of the lazy approach is that during the search, it avoids considering formulas with many quantifiers unless those are actually needed. Comparing the 3 versions of Bosco we see that AipLazy is not sensitive to the largest number of quantifiers that may appear in a Γ-valid simple TIP formula. The downside is that AipLazy performs many Ivy checks in order to compute the final ΔLazy. The total duration of finding CTIs varies significantly (as demonstrated under the column **tI**), in part because it is very sensitive to the CTIs returned by Ivy, which are in turn affected by the random seed used in the heuristics of the underlying solver.

Finally, <sup>Δ</sup>Lazy provides more insight into the protocol design, since it presents minimal assumptions that are required for protocol correctness. Thus, it may be useful in designing and understanding protocols.

#### **7 Related Work**

**Fully Automatic Verification of Threshold-Based Protocols.** Algorithms modeled as Threshold automata (TA) [14] have been studied in [13,16], and verified using an automated tool ByMC [15]. The tool also automatically synthesizes thresholds as arithmetic expressions [24]. Reachability properties of TAs for more general thresholds are studied in [18]. There have been recent advances in verification of synchronous threshold-based algorithms using TAs [41], and of asynchronous randomized algorithms where TAs support coin tosses and an unbounded number of rounds [4]. Still, this modeling is very restrictive and not as faithful to the pseudo-code as our modeling.

Another approach for full automation is to use sound and incomplete procedures for deduction and invariant search for logics that combine quantifiers and set cardinalities [8,10]. However, distributed systems of the level of complexity we consider here (e.g., Byzantine Fast Paxos) are beyond the reach of these techniques.

**Verification of Distributed Protocols Using Decidable Logics.** Padon et al. [33] introduced an interactive approach for the safety verification of distributed protocols based on EPR using the Ivy [29] verification tool. Later works extended the approach to more complex protocols [32], their implementations [42], and liveness properties [30,31]. Those works verified some threshold protocols using ad-hoc first-order modeling and axiomatization of thresholdintersection properties, whereas we develop a systematic methodology. Moreover, the axioms were not mechanically verified, except in [42], where a simple intersection property—intersection of two sets with more than **<sup>n</sup>** <sup>2</sup> nodes—requires a proof by induction over **n**. The proof relies on a user provided induction hypothesis that is automatically checked using the FAU decidable fragment [9]. This approach requires user ingenuity even for a simple intersection property, and we expect that it would not scale to the more complex properties required for e.g. Bosco or Fast Byzantine Paxos. In contrast, our approach completely automates both verification and inference of threshold-intersection properties required to verify protocol correctness.

Dragoi et al. [6] propose a decidable logic supporting cardinalities, uninterpreted functions, and universal quantifiers for verifying consensus algorithms expressed in the partially synchronous Heard-Of Model. As in this paper, the user is expected to provide an inductive invariant. The PSync framework [7] extends the approach to protocol implementations. Compared to our approach, the approach of Dragoi et al. is less flexible due to the specialized logic used and the restrictions of the Heard-Of Model.

Our approach decomposes verification into EPR and BAPA. Piskac [34] presents a decidable logic that combines BAPA and EPR, with some restrictions. The verification conditions of the protocols we consider are outside the scope of this fragment since they include cardinality constraints in the scope of quantifiers. Furthermore, this logic is not supported by mature solvers. Instead of looking for a specialized logic per protocol, we rely on a decomposition which allows more flexibility.

Recently, [11] presented an approach for verifying asynchronous algorithms by reduction to synchronous verification. This technique is largely orthogonal and complementary to our approach, which is focused on the challenge of cardinality thresholds.

*Verification using interactive theorem provers.* We are not aware of works based on interactive theorem provers that verified protocols with complex thresholds as we do in this work (although doing so is of course possible). However, many works used interactive theorem provers to verify related protocols, e.g., [12,27, 36–38,43] (the most related protocols use either <sup>n</sup> <sup>2</sup> or <sup>2</sup><sup>n</sup> <sup>3</sup> as the only thresholds, other protocols do not involve any thresholds). The downside of verification using interactive theorem provers is that it requires tremendous human efforts and skills. For example, the Verdi proof of Raft included 50,000 lines of proof in Coq for 500 lines of code [44].

#### **8 Conclusion**

This paper proposes a new deductive verification approach for threshold-based distributed protocols by decomposing the verification problem into two wellestablished decidable logics, BAPA and EPR, thus allowing greater flexibility compared to monolithic approaches based on domain-specific, specialized logics. The user models their protocol in EPR, defines the thresholds and resilience conditions using arithmetic in BAPA, and provides an inductive invariant. An automatic procedure infers threshold intersection properties expressed in TIP that are both (1) sound w.r.t. the resilience conditions (checked in quantifierfree BAPA) and (2) sufficient to discharge the VCs (checked in EPR). Both logics are supported by mature solvers, and allow providing the user with an understandable counterexample in case verification fails.

Our evaluation, which includes notoriously tricky fast-learning consensus protocols, shows that threshold intersection properties are inferred in a matter of minutes. While this may be too slow for interactive use, we expect improvements such as memoization and parallelism to provide response times of a few seconds in an iterative, interactive setting. Another potential future direction is combining our inference algorithm with automated invariant inference algorithms.

**Acknowledgements.** We thank the anonymous referees for insightful comments which improved this paper. This publication is part of a project that has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No [759102-SVIS] and [787367-PaVeS]). The research was partially supported by Len Blavatnik and the Blavatnik Family foundation, the Blavatnik Interdisciplinary Cyber Research Center, Tel Aviv University, the Israel Science Foundation (ISF) under grant No. 1810/18, the United States-Israel Binational Science Foundation (BSF) grant No. 2016260 and the Austrian Science Fund (FWF) through Doctoral College LogiCS (W1255-N23).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Gradual Consistency Checking**

Rachid Zennou1,2(B), Ahmed Bouajjani<sup>1</sup>, Constantin Enea<sup>1</sup>, and Mohammed Erradi<sup>2</sup>

<sup>1</sup> Universit´e de Paris, IRIF, CNRS, 75013 Paris, France rachid.zennou@gmail.com, {abou,cenea}@irif.fr <sup>2</sup> ENSIAS, University Mohammed V, Rabat, Morocco mohamed.erradi@gmail.com

**Abstract.** We address the problem of checking that computations of a shared memory implementation (with write and read operations) adheres to some given consistency model. It is known that checking conformance to Sequential Consistency (SC) for a given computation is NP-hard, and the same holds for checking Total Store Order (TSO) conformance. This poses a serious issue for the design of scalable verification or testing techniques for these important memory models. In this paper, we tackle this issue by providing an approach that avoids hitting systematically the worst-case complexity. The idea is to consider, as an intermediary step, the problem of checking weaker criteria that are as strong as possible while they are still checkable in polynomial time (in the size of the computation). The criteria we consider are new variations of causal consistency suitably defined for our purpose. The advantage of our approach is that in many cases (1) it can catch violations of SC/TSO early using these weaker criteria that are efficiently checkable, and (2) when a computation is causally consistent (according to our newly defined criteria), the work done for establishing this fact simplifies significantly the work required for checking SC/TSO conformance. We have implemented our algorithms and carried out several experiments on realistic cache-coherence protocols showing the efficiency of our approach.

#### **1 Introduction**

This paper addresses the problem of checking whether a given implementation of a shared memory offers the expected consistency guarantees to its clients which are concurrent programs composed of several threads running in parallel. Indeed, users of a memory need to see it as an abstract object allowing to perform concurrent reads and writes over a set of variables, which conform to some *memory model* defining the valid visible sequences of such operations. Various memory models can be considered in this context. Sequential Consistency (SC) [24] is the model where operations can be seen as atomic, executing according to some

This work is supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 678177).

interleaving of the operations issued by the different threads, while preserving the order in which these operations were issued by each of the threads. This fundamental model offers strong consistency in the sense that for each write operation, when it is issued by a thread, it is immediately visible to all the other threads. Other weaker memory models are adopted in order to meet performance and/or availability requirements in concurrent/distributed systems. One of the most widely used models in this context is Total Store Order (TSO) [29]. In this model, writes can be delayed, which means that after a write is issued, it is not immediately visible to all threads (except for the thread that issued it), and it is committed later after some arbitrary delay. However, writes issued by the same thread are committed in the same order are they were issued, and when a write is committed it becomes visible to all the other threads simultaneously. TSO is implemented in hardware but also in a distributed context over a network [22].

Implementing shared memories that are both highly performant and correct with respect to a given memory model is an extremely hard and error prone task. Therefore, checking that a given implementation is indeed correct from this point of view is of paramount importance. In this paper we address the issue of checking that a given execution of a shared memory implementation is consistent, and we consider as consistency criteria the cases of SC and TSO.

Checking SC or TSO conformance is known to be NP-complete [18,21]. This is due to the fact that in order to justify that the execution is consistent, one has to find a total order between the writes which explains the read operations happening along the computation. It can be proved that one cannot avoid enumerating all the possible total orders between writes, in the worst case. The situation is different for other weaker criteria such as Causal Consistency (CC) and its different variations, which have been shown to be checkable in polynomial time (in the the size of the computation) [6]. In fact, CC imposes fewer constraints than SC/TSO on the order between writes, and the way it imposes these constraints is "deterministic", in the sense that they can be derived from the history of the execution by applying a least fixpoint computation (which can be encoded for instance, as a standard DATALOG program). All these complexity results hold under the assumption that each value is written at most once, which is without loss of generality for implementations which are dataindependent [31], i.e., their behavior doesn't depend on the concrete values read or written in the program. Indeed, any buggy behavior of such implementations can be exposed in executions satisfying this assumption <sup>1</sup>.

The intrinsic hardness of the problem of checking SC/TSO poses a crucial issue for the design of scalable verification or testing techniques for these important consistency models. Tackling this issue requires the development of practical approaches that can work well (with polynomial complexity) when the instance of the problem does not need to generate the worst case (exponential) complexity.

<sup>1</sup> All the CC variations become NP-complete without the assumption that each value is written at most once [6]. This holds for the variations of CC we introduce in this paper as well.

The purpose of this paper is to propose such an approach. The idea is to reduce the amount of "nondeterminism" in searching for the write orders in order to establish SC/TSO conformance. For that, our approach for SC is to consider a weaker consistency model called CCM (for Convergent Causal Memory), that is "as strong as possible" while being polynomial time checkable. In fact CCM is stronger than both causal memory [2,26] (CM) and causal convergence [7] (CCv), two other well-known variations of causal consistency. Then, if CCM is already violated by the given computation then we can conclude that the computation does not satisfy the stronger criterion SC. Here the hope is that in practice many computations violating SC can be caught already at this stage using a polynomial time check. Now, in the case that the computation does not violate CCM, we exploit the fact that establishing CCM already imposes a set of constraints on the order between writes. We show that these constraints form a partial order which *must* be a subset of any total write order that would witness for SC conformance. Therefore, at this point, it is enough to find an extension of this partial write order, and the hope is that in many practical cases, this set of constraints is already large enough, letting only a small number of pairs of writes to be ordered in order to check SC conformance. For the case of TSO, we proceed in the same way, but we consider a different intermediary polynomial time checkable criterion called *weak* CCM (wCCM). This is due to the fact that some causality constraints need to be relaxed in order to take into account the program order relaxations of TSO, that allow reads to overtake writes. The definitions of the new criteria CCM and wCCM we use in our approach are quite subtile. Ensuring that these criteria are "as strong as possible" by including all possible order constraints on pairs of writes that can be computed (in polynomial time) using a least fixpoint calculation, while still ensuring that they are weaker than SC/TSO, and proving this fact, is not trivial.

As a proof of concept, we implemented our approach for checking SC/TSO and applied it to executions extracted from realistic cache coherence protocols within the Gem5 simulator [5] in system emulation mode. This evaluation shows that our approach scales better than a direct encoding of the axioms defining SC and TSO [3] into boolean satisfiability. We also show that the partial order of writes imposed by the stronger criteria CCM and wCCM leaves only a small percentage of writes unordered (6.6% in average) in the case that the executions are valid, and most SC/TSO violations are also CCM/wCCM violations.

#### **2 Sequential Consistency and TSO**

We consider multi-threaded programs over a set of shared variables Var = {x, y, . . .}. Threads issue read and write operations. Assuming an unspecified set of values Val and a set of operation identifiers OId, we let

$$\mathsf{Op} = \{ \mathsf{read}\_i(x, v), \mathsf{write}\_i(x, v) : i \in \mathsf{Old}, x \in \mathsf{Val}, v \in \mathsf{Val} \}$$

be the set of operations reading a value v or writing a value v to a variable x. We omit operation identifiers when they are not important. The set of read, resp., write, operations is denoted by R, resp., W. The set of read, resp., write, operations in a set of operations O is denoted by R(O), resp., W(O). The variable accessed by an operation o is denoted by var(o).

Consistency criteria like SC or TSO are formalized on an abstract view of an execution called *history*. A history includes a set of write or read operations ordered according to a (partial) *program order* po which order operations issued by the same thread. Most often, po is a union of sequences, each sequence containing all the operations issued by some thread. Then, we assume that the history includes a *write-read* relation which identifies the write operation writing the value returned by each read in the execution. Such a relation can be extracted easily from executions where each value is written at most once. Since shared-memory implementations (or cache coherence protocols) are dataindependent [31] in practice, i.e., their behavior doesn't depend on the concrete values read or written in the program, any potential buggy behavior can be exposed in such executions.

**Definition 1.** *A* history O, po,wr *is a set of operations* O *along with a strict partial* program order po *and a* write-read *relation* wr <sup>⊆</sup> <sup>W</sup>(O) <sup>×</sup> <sup>R</sup>(O)*, such that the inverse of* wr *is a total function and if* (*write*(x, v),*read*(x , v )) ∈ wr*, then* x = x *and* v = v *.*

We assume that every history includes a write operation writing the initial value of variable x, for each variable x. These write operations precede all other operations in po. We use h, h1, h2, ... to range over histories.

We now define the SC and TSO memory models (we use the same definitions as in the formal framework developed by Alglave et al. [3]). Given a *history* h = O, po,wr and a variable x, a *store order on* x is a strict total order ww<sup>x</sup> on the write operations write (x, ) in O. A *store order* is a union of store orders wwx, one for each variable x used in h. A *history* O, po,wr is *sequentially consistent* (SC, for short) if there exists a *store order* ww such that po ∪ wr ∪ ww ∪ rw is acyclic. The *read-write* relation rw is defined by rw <sup>=</sup> wr−<sup>1</sup> ◦ww (where ◦ denotes the standard relation composition).

The definition of TSO relies on three additional relations: (1) the ppo relation which excludes from the program order pairs formed of a write and respectively, a read operation, i.e., ppo <sup>=</sup> po \ (W(O) <sup>×</sup> <sup>R</sup>(O)), (2) the po-loc relation which is a restriction of po to operations accessing the same variable, i.e., po-loc = po∩{(o, o ) | var(o) = var(o )}, and (3) the write-read external relation wr<sup>e</sup> which is a restriction of the write-read relation to pairs of operations in different threads (not related by program order), i.e., wr<sup>e</sup> = wr ∩ {(o, o ) | (o, o ) ∈ po and (o , o) ∈ po}. Then, we say that a history satisfies TSO if there exists a *store order* ww such that po-loc ∪ wr<sup>e</sup> ∪ ww ∪ rw and ppo ∪ wr<sup>e</sup> ∪ ww ∪ rw are both acyclic.

Notice that the formal definition of the TSO given above is equivalent to the formal operational model of TSO that consists in considering that each thread has a store buffer, and then, each write issued by a thread is first sent to its store buffer before being committed to the memory later in a nondeterministic way. To read a value on some variable x, a thread first checks if it there is still a write on x pending in its own buffer and in this case it takes the value of the last such as write, otherwise it fetches the value of x in the memory.

#### **3 Checking Sequential Consistency**

We define an algorithm for checking whether a history satisfies SC which enforces a polynomially-time checkable criterion weaker than SC, a variation of causal consistency, in order to construct a *partial* store order, i.e., one in which not all the writes on the same variable are ordered. This partial store order is then completed until it orders every two writes on the same variable using a standard backtracking enumeration. This approach is efficient when the number of writes that remain to be ordered using the backtracking enumeration is relatively small, a hypothesis confirmed by our experimental evaluation (see Sect. 5.).

The variation of causal consistency mentioned above, called *convergent causal memory* (CCM, for short), is stronger than existing variations [6] while still being polynomially-time checkable (and weaker than SC). Its definition uses several relations between read and write operations which are analogous or even exactly the same relations used to define those variations. Section 3.1 recalls the existing notions of causal consistency as they are defined in [6] (using the so called "badpattern" characterization introduced in that paper), Sect. 3.2 introduces CCM, while Sect. 3.3 presents our algorithm for checking SC.

#### **3.1 Causal Consistency**

The weakest variation of causal consistency, called *weak causal consistency* (CC, for short), requires that any two causally-dependent values are observed in the same order by all threads, where causally-dependent means that either those values were written by the same thread (i.e., the corresponding writes are ordered by po), or that one value was written by a thread after reading the other value, or any transitive composition of such dependencies. Values written concurrently by two threads can be observed in any order, and even-more, this order may change in time. A *history* O, po,wr satisfies CC if po ∪ wr ∪ rw[co] is acyclic where co = (po∪wr)<sup>+</sup> is called the *causal relation*. The *read-write* relation rw[co] induced by the causal relation is defined by

$$\begin{aligned} \left( \mathsf{read}(x, v), \mathsf{write}(x, v') \right) &\in \mathsf{w}[\mathsf{co}] \text{ iff } (\mathsf{write}(x, v), \mathsf{write}(x, v')) \in \mathsf{co} \text{ and} \\ &\quad (\mathsf{write}(x, v), \mathsf{read}(x, v)) \in \mathsf{wr}, \text{ for some } \mathsf{write}(x, v) \end{aligned}$$

The read-write relation rw[co] is a variation of rw from the definition of SC/TSO where the store order ww is replaced by the projection of co on pairs of writes. In general, given a binary relation R on operations, RWW denotes the projection of R on pairs of writes on the same variable. Then,

**Definition 2.** *The read-write relation* rw[R] *induced by a relation* R *is defined by* rw[R] = wr−<sup>1</sup> ◦ <sup>R</sup>WW*.*

*Causal convergence* (CCv, for short) is a strengthening of CC where concurrent values are required to be observed in the same order by all threads.

A *history* O, po,wr satisfies CCv if it satisfies CC and po∪wr ∪cf is acyclic where the *conflict relation* cf is defined by

$$\begin{aligned} (\mathsf{write}(x, v), \mathsf{write}(x, v')) \in \mathsf{cf} \text{ iff } (\mathsf{write}(x, v), \mathsf{read}(x, v')) \in \mathsf{co} \text{ and} \\ (\mathsf{write}(x, v'), \mathsf{read}(x, v')) \in \mathsf{wr}, \text{ for some } \mathsf{read}(x, v')) \end{aligned}$$

The conflict relation relates two writes w<sup>1</sup> and w<sup>2</sup> when w<sup>1</sup> is causally related to a read taking its value from w2. The definition of CCM, our new variation of causal consistency, relies on a generalization of the conflict relation where a different relation is used instead of co. Given a binary relation R on operations, RWR denotes the projection of R on pairs of writes and reads on the same variable, respectively.

**Definition 3.** *The conflict relation* cf[R] *induced by a relation* R *is defined by* cf[R] = <sup>R</sup>WR ◦ wr−1*.*



**Fig. 1.** Histories with two threads used to compare different consistency models. Operations of the same thread are aligned vertically.

Finally, *causal memory* (CM, for short) is a strengthening of CC where roughly, concurrent values are required to be observed in the same order by a thread during its entire execution. Differently from CCv, this order can differ from one thread to another. Although this intuitive description seems to imply that CM is weaker than CCv, the two models are actually incomparable. For instance, the history in Fig. 1a is allowed by CM, but not by CCv. It is not allowed by CCv because reading 1 from x in the first thread implies that it observed write(x, 1) after write(x, 2) while reading 2 from x in the second thread implies that it observed write(x, 2) after write(x, 1). While this is allowed by CM where different threads can observe concurrent writes in different orders, it is not allowed by CCv. Then, the history in Fig. 1b is CCv but not CM. It is not allowed by CM because reading the initial value 0 from z implies that write(x, 1) is observed after write(x, 2) while reading 2 from x implies that write(x, 2) is observed after write(x, 1) (write(x, 1) must have been observed because the same thread reads 1 from y and the writes on x and y are causally related). However, under CCv, a thread simply reads the most recent value on each variable and the order in which these values are ordered using timestamps for instance is independent of the order in which variables are read in a thread, e.g., reading 0 from z doesn't imply that the timestamp of write(x, 2) is smaller than the timestamp of write(x, 1). This history is admitted by CCv assuming that the order in which write(x, 1) and write(x, 2) are observed is write(x, 1) before write(x, 2).

Let us give the formal definition of CM. Let h=O, po,wr be a history. For every operation o in h, let hb<sup>o</sup> be the smallest transitive relation such that:


A history O, po,wr satisfies CM if it satisfies CC and for each operation o in the history, the relation hb<sup>o</sup> is acyclic.

Bouajjani et al. [6] show that the problem of checking whether a history satisfies CC, CCv, or CM is polynomial time. This result is a straightforward consequence of the above definitions, since the union of relations required to be acyclic can be computed in polynomial time from the relations po and wr which are fixed in a given history. In particular, the union of these relations can be computed by a DATALOG program.

#### **3.2 Convergent Causal Memory**

We define a new variation of causal consistency which builds on causal memory, but similar to causal convergence it enforces that all threads agree on an order in which to observe values written by concurrent (causally-unrelated) writes, and also, it uses a larger read-write relation. A history O, po,wr satisfies *convergent causal memory* (CCM, for short) if po ∪ wr ∪ pww ∪ rw[pww] is acyclic, where the *partial store order* pww is defined by

$$\mathsf{pow} = (\mathsf{hb}\_{\mathsf{WW}} \cup \mathsf{cf}[\mathsf{hh}])^{+} \text{ with } \mathsf{hb} = \left(\bigcup\_{o \in O} \mathsf{hb}\_o\right)^{+}.$$

The partial store order pww contains the ordering constraints between writes in all relations hb<sup>o</sup> used to defined causal memory, and also, the conflict relation induced by this set of constraints (a weaker version of conflict relation was used to define causal convergence).

As a first result, we show that all the variations of causal consistency in Sect. 3.1, i.e., CC, CCv and CM, are strictly weaker than CCM.

#### **Lemma 1.** *If a history satisfies CCM, then it satisfies CC, CCv and CM.*

*Proof.* Let h = O, po,wr be a history satisfying CCM. By the definition of hb, we have that coWW ⊆ hbWW. Indeed, any two writes o<sup>1</sup> and o<sup>2</sup> related by co are also related by hb<sup>o</sup><sup>2</sup> , which by the definition of hb, implies that they are related by hbWW. Then, by the definition of pww, we have that hbWW ⊆ pww. This implies that rw[co] ⊆ rw[pww] (by definition, rw[co] = rw[coWW]). Therefore, the acyclicity of po ∪ wr ∪ pww ∪ rw[pww] implies that its subset (po ∪ wr ∪ rw[co] is also acyclic, which means that h satisfies CC. Also, it implies that po∪wr∪cf[hb] is acyclic (the last term of the union is included in pww), which by co ⊆ hb, implies that po ∪ wr ∪ cf[co] is acyclic, and thus, h satisfies CCv. The fact that h satisfies CM follows from the fact that h satisfies CC (since po ∪wr is acyclic) and hb is acyclic (hbWW is included in pww and the rest of the dependencies in hb are included in po ∪ wr). -

The reverse of the above lemma doesn't hold. Figure 1c shows a history which satisfies CM and CCv, but it is not CCM. To show that this history does not satisfy CCM we use the fact that pww relates any two writes which are ordered by program order. Then, we get that read(x, 1) and write(x, 2) are related by rw[pww] (because write(x, 1) is related by write-read with read(x, 1)), which further implies that (read(x, 1),read(y, 1)) ∈ rw[pww] ◦ po. Similarly, we have that (read(y, 1),read(x, 1)) ∈ rw[pww]◦po, which implies that po∪wr∪pww∪rw[pww] is *not* acyclic, and therefore, the history does not satisfy CCM. The fact that this history satisfies CM and CCv follows easily from definitions.

Next, we show that CCM is weaker than SC, which will be important in our algorithm for checking whether a history satisfies SC.

#### **Lemma 2.** *If a history satisfies SC, then it satisfies CCM.*

*Proof.* Using the definition of CCM, Let h = O, po,wr be a history satisfying SC. Then, there exists a *store order* ww such that po∪wr∪ww∪rw[ww] is acyclic. We show that the two relations hbWW and cf[hb], whose union constitutes pww, are both included in ww. We first prove that hb <sup>⊆</sup> (po <sup>∪</sup> wr <sup>∪</sup> ww <sup>∪</sup> rw[ww])<sup>+</sup> by structural induction on the definition of hbo:

1. if (o1, o2) <sup>∈</sup> co = (po∪wr)<sup>+</sup>, then clearly, (o1, o2) <sup>∈</sup> (po∪wr <sup>∪</sup>ww <sup>∪</sup>rw[ww])<sup>+</sup>,

2. if (write(x, v),read(x, v )) <sup>∈</sup> (po <sup>∪</sup>wr <sup>∪</sup>ww <sup>∪</sup> rw[ww])<sup>+</sup> and there is read(x, v ) such that (write(x, v ),read(x, v )) ∈ wr, then (write(x, v),write(x, v )) ∈ ww. Otherwise, assuming by contradiction that (write(x, v ),write(x, v)) ∈ ww, we get that (read(x, v ),write(x, v)) ∈ rw[ww] (by the definition of rw[ww] using the hypothesis (write(x, v ),read(x, v )) ∈ wr). Note that the latter implies that po ∪ wr ∪ ww ∪ rw[ww] is cyclic.

Since hb <sup>⊆</sup> (po <sup>∪</sup> wr <sup>∪</sup> ww <sup>∪</sup> rw[ww])<sup>+</sup>, we get that hbWW <sup>⊆</sup> ww. Also, since cf[(po <sup>∪</sup> wr <sup>∪</sup> ww <sup>∪</sup> rw[ww])<sup>+</sup>] <sup>⊆</sup> (po <sup>∪</sup> wr <sup>∪</sup> ww <sup>∪</sup> rw[ww])<sup>+</sup> (using a similar argument as in point (2) above), we get that cf[hb] <sup>⊆</sup> (po∪wr <sup>∪</sup>ww <sup>∪</sup>rw[ww])<sup>+</sup>.

Finally, since pww <sup>⊆</sup> ww, we get that (po <sup>∪</sup> wr <sup>∪</sup> pww <sup>∪</sup> rw[pww])<sup>+</sup> <sup>⊆</sup> (po <sup>∪</sup> wr <sup>∪</sup> ww <sup>∪</sup> rw[ww])<sup>+</sup>, which implies that the acyclicity of the latter implies the acyclicity of the former. Therefore, h satisfies CCM. -

The reverse of the above lemma doesn't hold. For instance, the history in Fig. 1d is not SC but it is CCM. This history admits a partial store order pww where the writes in different threads are not ordered.

**Fig. 2.** Relationships between consistency models. Directed arrows denote the "weakerthan" relation while dashed lines connect incomparable models.

The left side of Fig. 2 (ignoring wCCM and TSO) summarizes the relationships between the consistency models presented in this section.

The partial store order pww can be computed in polynomial time (in the size of the input history). Indeed, the hb<sup>o</sup> relations can be computed using a least fixpoint calculation that converges in at most a quadratic number of iterations and acyclicity can be decided in polynomial time. Therefore,

**Theorem 1.** *Checking whether a history satisfies CCM is polynomial time in the size of the history.*

#### **3.3 An Algorithm for Checking Sequential Consistency**

Algorithm 1 checks whether a given history satisfies sequential consistency. As a first step, it checks whether the given history satisfies CCM. If this is not the case, then, by Lemma 2, the history does not satisfy SC as well, and the algorithm returns *false*. Otherwise, it enumerates store orders which extend the partial store order pww, until finding one that witnesses for satisfaction of SC. The history is a violation to SC iff no such store order is found. The soundness of this last step is implied by the proof of Lemma 2, which shows that pww is included in any store order ww witnessing for SC satisfaction.

**Theorem 2.** *Algorithm 1 returns true iff the input history* h *satisfies SC.*

```
Input: A history h = -
                       O, po, wr
  Output: true iff h satisfies SC
1 if po ∪ wr ∪ pww ∪ rw[pww] is cyclic then
2 return false;
3 end
4 foreach ww ⊃ pww do
5 if po ∪ wr ∪ ww ∪ rw[ww] is acyclic then
6 return true;
7 end
8 end
9 return false;
```
#### **4 Checking Conformance to the TSO Model**

We consider now the problem of checking whether a history satisfies TSO. Following the approach developed above for SC, we define a polynomial time checkable criterion, based on a (different) variation of causal consistency that is suitable for the case of TSO. This allows to reduce the number of pairs of writes for which an order must be guessed in order to establish conformance to TSO.

The case of TSO requires the definition of a new intermediary consistency model because CCM is based on a causality order that includes the program order po which is relaxed in the context of TSO, compared to the SC model. Indeed, CCM is *not* weaker than TSO as shown by the history in Fig. 1b (note that this does not imply that other variations of causal consistency, CC and CCv, are also not weaker than TSO). This history satisfies TSO because, based on its operational model, the operation write(x, 2) of thread t<sup>1</sup> can be delayed (pending in the store buffer of t1) until the end of the execution. Therefore, after executing read(z, 0), all the writes of thread t<sup>0</sup> are committed to the main memory so that thread t<sup>1</sup> can read 1 from y and 2 from x (it is obliged to read the value of x from its own store buffer). This history is not admitted by CCM because it is not admitted by the weaker causal consistency variation CM. Figure 3 shows a history admitted by CCM but not by TSO. Indeed, under TSO, both t<sup>2</sup> and t<sup>3</sup> should see the writes on x and y performed by t<sup>0</sup> and t1, respectively, in the same order. This is not the case, because t<sup>2</sup> "observes" the write on x before the write on y (since it reads 0 from y) and t<sup>3</sup> "observes" the write on y before the write on x (since it reads 0 from x). This history is admitted by CCM because the two writes are causally independent and they concern different variables. We mention that TSO and CM are also incomparable. For instance, the history in Fig. 1a is allowed by CM, but not by TSO. The history in Fig. 1b is admitted by TSO, but not by CM.

Next, we define a weakening of CCM, called *weak convergent causal memory* (wCCM), which is also weaker than TSO. The model wCCM is based on causality relations induced by the relaxed program orders ppo and po-loc instead of po, and the external write-read relation instead of the full write-read relation.


**Fig. 3.** A history admitted by wCCM and CCM but not by TSO.

#### **4.1 Weak Convergent Causal Memory**

First, we define two causality relations relative to the partial program orders in the definition of TSO and the external write-read relation: For π ∈ {ppo, po-loc}, let co<sup>π</sup> = (<sup>π</sup> <sup>∪</sup> wre)<sup>+</sup>. We also consider a notion of conflict that is defined in terms of the external write-read relation as follows: For a given relation R, let cfe[R] = <sup>R</sup>WR ◦ wr−<sup>1</sup> <sup>e</sup> .

Then, given a history O, po,wr, we define for each operation o two happensbefore relations hbppo <sup>o</sup> and hbpo-loc <sup>o</sup> . The definition of these relations is similar to the one of hb<sup>o</sup> (from causal memory), the differences being that po is replaced by ppo and po-loc respectively, co is replaced by coppo and copo-loc respectively, and wr is replaced by wre. Therefore, for <sup>π</sup> ∈ {ppo, po-loc}, hb<sup>π</sup> <sup>o</sup> is is the smallest transitive relation such that:


Let hb<sup>π</sup> = ( <sup>o</sup>∈<sup>O</sup> hb<sup>π</sup> <sup>o</sup> )<sup>+</sup>, for <sup>π</sup> ∈ {ppo, po-loc}, and let whb = (hbppo <sup>o</sup> ∪ hbpo-loc <sup>o</sup> )<sup>+</sup>. Then, the weak partial store order is defined as follows:

$$\mathsf{h\mathsf{w}}\mathsf{p}\mathsf{w}\mathsf{w} = (\mathsf{whb}\_{\mathsf{W}\mathsf{W}} \cup \mathsf{cf}\_{e}[\mathsf{hb}^{\mathsf{p}\mathsf{o}-\mathsf{l}\mathsf{c}}] \cup \mathsf{cf}\_{e}[\mathsf{hb}^{\mathsf{p}\mathsf{p}\mathsf{o}}])^{+}$$

Then, we say that a history O, po,wr satisfies *weak convergent causal memory* (wCCM) if both relations:

ppo ∪ wr<sup>e</sup> ∪ wpww ∪ rw[wpww] and po-loc ∪ wr<sup>e</sup> ∪ wpww ∪ rw[wpww]

are acyclic.

**Lemma 3.** *If a history satisfies TSO, then it satisfies wCCM.*

*Proof.* Let h = O, po,wr be a history satisfying TSO. Then, there exists a store order ww such that po-loc ∪ wr<sup>e</sup> ∪ ww ∪ rw and ppo ∪ wr<sup>e</sup> ∪ ww ∪ rw are both acyclic. The fact that

$$\mathsf{hb}^{\mathsf{po-loc}} \subseteq (\mathsf{po-loc} \cup \mathsf{wr}\_e \cup \mathsf{ww} \cup \mathsf{rw})^+ \text{ and } \mathsf{hb}^{\mathsf{ppo}} \subseteq (\mathsf{ppo} \cup \mathsf{wr}\_e \cup \mathsf{ww} \cup \mathsf{rw})^+$$

can be proved by structural induction like in the case of SC (the step of the proof showing that hb ⊆ po∪wr ∪ww ∪rw[ww]). Then, since ww is a total order on writes on the same variable, we get that the projection of whb (the transitive closure of the union of hbpo-loc and hbppo) on pairs of writes on the same variable is included in ww. Therefore, whbWW <sup>⊆</sup> ww. Then, since cfe[Rπ] <sup>⊆</sup> <sup>R</sup><sup>π</sup> for each <sup>R</sup><sup>π</sup> = (<sup>π</sup> <sup>∪</sup>wr<sup>e</sup> <sup>∪</sup>ww <sup>∪</sup>rw)<sup>+</sup> with <sup>π</sup> ∈ {ppo, po-loc} and since each cfe[Rπ] relates only writes on the same variable, we get that each cfe[Rπ] is included in ww. This implies that wpww ⊆ ww.

Finally, since wpww <sup>⊆</sup> ww, we get that (<sup>π</sup> <sup>∪</sup> wr <sup>∪</sup> wpww <sup>∪</sup> rw[wpww])<sup>+</sup> <sup>⊆</sup> (<sup>π</sup> <sup>∪</sup> wr <sup>∪</sup> ww <sup>∪</sup> rw[ww])<sup>+</sup>, for each <sup>π</sup> ∈ {ppo, po-loc}. In each case, the acyclicity of the latter implies the acyclicity of the former. Therefore, h satisfies wCCM.

```
Input: A history h = -
                       O, po, wr
  Output: true iff h satisfies TSO
1 if ppo∪ wre ∪ wpww ∪rw[wpww] or po-loc∪ wre ∪pww ∪rw[wpww] is cyclic then
2 return false;
3 end
4 foreach ww ⊃ wpww do
5 if ppo ∪ wre ∪ ww ∪ rw[ww] and po-loc ∪ wre ∪ ww ∪ rw[ww] are acyclic then
6 return true;
7 end
8 end
9 return false;
```
**Algorithm 2.** Checking TSO conformance.

The reverse of the above lemma does not hold. Indeed, it can be easily seen that wCCM is weaker than CCM (since wpww is included in pww) and the history in Fig. 3, which satisfies CCM but not TSO (as explained in the beginning of the section), is also an example of a history that satisfies wCCM but not TSO. Then, wCCM is incomparable to CM. For instance, the history in Fig. 1b is allowed by wCCM (since it is allowed by TSO as explained in the beginning of the section) but not by CM. Also, since CCM is stronger than CM, the history in Fig. 3 satisfies CM but not wCCM (since it does not satisfy TSO). These relationships are summarized in Fig. 2. Establishing the precise relation between CC/CCv and TSO is hard because they are defined using one, resp., two, acyclicity conditions. We believe that CC and CCv are weaker than TSO, but we don't have a formal proof.

Finally, it can be seen that, similarly to pww, the weak partial store order wpww can be computed in polynomial time, and therefore:

**Theorem 3.** *Checking whether a history satisfies wCCM is polynomial time in the size of the history.*

#### **4.2 An Algorithm for Checking TSO Conformance**

The algorithm for checking TSO conformance for a given history is given in Fig. 2. It starts by checking whether the history violates the weaker consistency model wCCM. If yes, it returns false. If not, it starts enumerating the orders between the writes that are not related by the weak partial store order wpww until it founds one that allows establishing TSO conformance and in this case it returns true. Otherwise it returns false.

**Theorem 4.** *Algorithm 2 returns true iff the input history* h *satisfies TSO.*

#### **5 Experimental Evaluation**

To demonstrate the practical value of the theory developed in the previous sections, we argue that our algorithms are efficient and scalable. We experiment with both SC and TSO algorithms, investigating their running time compared to a standard encoding of these models into boolean satisfiability on a benchmark obtained by running realistic cache coherence protocols within the Gem5 simulator [5] in system emulation mode.

Histories are generated with random clients of the following cache coherence protocols included in the Gem5 distribution: MI, MEOSI Hammer, MESI Two Level, and MEOSI AMD Base. The randomization process is parametrized by the number of cpus (threads) and the total number of read- /write operations. We ensure that every value is written at most once.

We have compared two variations of our algorithms for checking SC/TSO with a standard encoding of SC/TSO into boolean satisfiability (named X-SAT where X is SC or TSO). The two variations differ in the way in which the partial store order pww dictated by CCM is completed to a total store order ww as required by SC/TSO: either using standard enumeration (named X-CCM+Enum where X is SC or TSO) or using a SAT solver (named X-CCM+SAT where X is SC or TSO).

The computation of the partial store order pww is done using an encoding of its definition into a DATALOG program. The inductive definition of hb<sup>o</sup> supports an easy translation to DATALOG rules, and the same holds for the union of two relations, or their composition. We used Clingo [19] to run DATALOG programs.

#### **5.1 Checking SC**

Figure 4 reports on the running time of the three algorithms while increasing the number of operations or cpus. All the histories considered in this experiment satisfy SC. This is intended because valid histories force our algorithms to enumerate extensions of the partial store order (SC violations may be detected while checking CCM). The graph on the left pictures the evolution of the running time when increasing the number of operations from 100 to 500, in increments of 100 (while using a constant number of 4 cpus). For each number of operations, we have considered 200 histories and computed the average running time. The graph on the right shows the running time when increasing the number of cpus from 2 to 6, in increments of 1. For x cpus, we have limited the number of operations to 50x. As before for each number of cpus, we have considered 200 histories and computed

(a) Checking SC while varying the number of operations.

(b) Checking SC while varying the number of cpus.

**Fig. 4.** Checking SC for valid histories.

the average running time. As it can be observed, our algorithms scale much better than the SAT encoding and interestingly enough, the difference between an explicit enumeration of pww extensions and one using a SAT solver is not significant. Note that even small improvements on the average running time provide large speedups when taking into account the whole testing process, i.e., checking consistency for a possibly large number of (randomly-generated) executions. For instance, the work on McVerSi [13], which focuses on the complementary problem of finding clients that increase the probability of uncovering bugs, shows that exposing bugs in some realistic cache coherence implementations requires even 24 h of continuous testing.

Since the bottleneck in our algorithms is given by the enumeration of pww extensions, we have measured the percentage of pairs of writes that are *not* ordered by pww. Thus, we have considered a random sample of 200 histories (with 200 operations per history) and evaluated this percentage to be just 6.6%, which is surprisingly low. This explains the net gain in comparison to a SAT encoding of SC, since the number of pww extensions that need to be enumerated is quite low. As a side remark, using CCv instead of CCM in the algorithms above leads to a drastic increase in the number of unordered writes. For the same random sample of 200 histories, we conclude that using CCv instead of CCM leaves 57.75% of unordered writes in average which is considerably bigger than the percentage of unordered writes when using CCM.

We have also evaluated our algorithms on SC violations. These violations were generated by reordering statements from the MI implementation, e.g., swapping the order of the actions s store hit and p profileHit in the transition transition(M, Store). As an optimization, our implementation checks gradually the weaker variations of causal consistency CC and CCv before checking CCM. This is to increase the chances of returning in the case of a violation (a violation to CC/CCv is also a violation to CCM and SC). We have considered 1000 histories with 100 to 400 operations and 2 to 8 cpus, equally distributed in function

**Fig. 5.** Checking SC for invalid histories while increasing the number of cpus.

(a) Checking TSO while varying the number of operations. (b) Checking TSO while varying the number of cpus.

**Fig. 6.** Checking TSO for valid histories.

of the number of cpus. Figure 5 reports on the evolution of the average running time. Since these histories happen to all be CCM violations, SC-CCM+Enum and SC-CCM+SAT have the same running time. As an evaluation of our optimization, we have found that 50% of the histories invalidate weaker variations of causal consistency, CC or CCv.

#### **5.2 Checking TSO**

We have evaluated our TSO algorithms on the same set of histories used for SC in Fig. 4. Since these histories satisfy SC, they satisfy TSO as well. As in the case of SC, our algorithms scale better than the SAT encoding. However, differently from SC, the enumeration of wpww extensions using a SAT solver outperforms the explicit enumeration. Since this difference was more negligible in the case of SC, it seems that the SAT variation is generally better.

#### **6 Related Work**

While several static techniques have been developed to prove that a sharedmemory implementation (or cache coherence protocol) satisfies SC [1,4,9–12,17, 20,23,27,28] few have addressed dynamic techniques such as testing and runtime verification (which scale to more realistic implementations). From the complexity standpoint, Gibbons and Korach [21] showed that checking whether a history is SC is np-hard while Alur et al. [4] showed that checking SC for finite-state shared-memory implementations (over a bounded number of threads, variables, and values) is undecidable [4]. The fact that checking whether a history satisfies TSO is also np-hard has been proved by Furbach et al. [18].

There are several works that addressed the testing problem for related criteria, e.g., linearizability. While SC requires that the operations in a history be explained by a linearization that is consistent with the program order, linearizability requires that such a linearization be also consistent with the realtime order between operations (linearizability is stronger than SC). The works in [25,30] describe monitors for checking linearizability that construct linearizations of a given history incrementally, in an online fashion. This incremental construction cannot be adapted to SC since it strongly relies on the specificities of linearizability. Line-Up [8] performs systematic concurrency testing via schedule enumeration, and offline linearizability checking via linearization enumeration. The works in [15,16] show that checking linearizability for some particular class of ADTs is polynomial time. Emmi and Enea [14] consider the problem of checking weak consistency criteria, but their approach focuses on specific relaxations in those criteria, falling back to an explicit enumeration of linearizations in the context of a criterion like SC or TSO. Bouajjani et al. [6] consider the problem of checking causal consistency. They formalize the different variations of causal consistency we consider in this work and show that the problem of checking whether a history satisfies one of these variations is polynomial time.

The complementary issue of test generation, i.e., finding clients that increase the probability of uncovering bugs in shared memory implementations, has been approached in the McVerSi framework [13]. Their methodology for checking a criterion like SC lies within the context of white-box testing, i.e., the user is required to annotate the shared memory implementation with events that define the store order in an execution. Our algorithms have the advantage that the implementation is treated as a black-box requiring less user intervention.

#### **7 Conclusion**

We have introduced an approach for checking the conformance of a computation to SC or to TSO, a problem known to be NP-hard. The idea is to avoid an explicit enumeration of the exponential number of possible total orders between writes in order to solve these problems. Our approach is to define weaker criteria that are as strong as possible but still polynomial time checkable. This is useful for (1) early detection of violations, and (2) reducing the number of pairs of writes for which an order must be found in order to check SC/TSO conformance. Morally, the approach consists in being able to capture an "as large as possible" partial order on writes that can be computed in polynomial time (using a least fixpoint calculation), and which is a subset of any total order witnessing SC/TSO conformance. Our experimental results show that this approach is indeed useful and performant: it allows to catch most of violations early using an efficient check, and it allows to compute a large kernel of write constraints that reduces significantly the number of pairs of writes that are left to be ordered in an enumerative way. Future work consists in exploring the application of this approach to other correctness criteria that are hard to check such a serializability in the context of transactional programs.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Checking Robustness Against Snapshot Isolation**

Sidi Mohamed Beillahi(B) , Ahmed Bouajjani, and Constantin Enea

> Universit´e de Paris, IRIF, CNRS, Paris, France *{*beillahi,abou,cenea*}*@irif.fr

**Abstract.** Transactional access to databases is an important abstraction allowing programmers to consider blocks of actions (transactions) as executing in isolation. The strongest consistency model is *serializability*, which ensures the atomicity abstraction of transactions executing over a sequentially consistent memory. Since ensuring serializability carries a significant penalty on availability, modern databases provide weaker consistency models, one of the most prominent being *snapshot isolation*. In general, the correctness of a program relying on serializable transactions may be broken when using weaker models. However, certain programs may also be insensitive to consistency relaxations, i.e., all their properties holding under serializability are preserved even when they are executed over a weak consistent database and without additional synchronization.

In this paper, we address the issue of verifying if a given program is *robust against snapshot isolation*, i.e., all its behaviors are serializable even if it is executed over a database ensuring snapshot isolation. We show that this verification problem is polynomial time reducible to a state reachability problem in transactional programs over a sequentially consistent shared memory. This reduction opens the door to the reuse of the classic verification technology for reasoning about weakly-consistent programs. In particular, we show that it can be used to derive a proof technique based on Lipton's reduction theory that allows to prove programs robust.

#### **1 Introduction**

Transactions simplify concurrent programming by enabling computations on shared data that are isolated from other concurrent computations and resilient to failures. Modern databases provide transactions in various forms corresponding to different tradeoffs between consistency and availability. The strongest consistency level is achieved with *serializable* transactions [21] whose outcome in concurrent executions is the same as if the transactions were executed atomically in some order. Since serializability carries a significant penalty on availability, modern databases often provide weaker consistency models, one of the

This work is supported in part by the European Research Council (ERC) under the Horizon 2020 research and innovation programme (grant agreement No 678177).

most prominent being *snapshot isolation* (SI) [5]. Then, an important issue is to ensure that the level of consistency needed by a given program coincides with the one that is guaranteed by its infrastructure, i.e., the database it uses. One way to tackle this issue is to investigate the problem of checking *robustness* of programs against consistency relaxations: Given a program P and two consistency models S and W such that S is stronger than W, we say that P is robust for <sup>S</sup> against <sup>W</sup> if for every two implementations <sup>I</sup><sup>S</sup> and <sup>I</sup><sup>W</sup> of <sup>S</sup> and <sup>W</sup> respectively, the set of computations of <sup>P</sup> when running with <sup>I</sup><sup>S</sup> is the same as its set of computations when running with <sup>I</sup><sup>W</sup> . This means that <sup>P</sup> is not sensitive to the consistency relaxation from S to W, and therefore it is possible to reason about the behaviors of P assuming that it is running over S, and no additional synchronization is required when P runs over the weak model W such that it maintains all its properties satisfied with S.

In this paper, we address the problem of verifying robustness of transactional programs for serializability, against *snapshot isolation*. Under snapshot isolation, any transaction t reads values from a snapshot of the database taken at its start and t can commit only if no other committed transaction has written to a location that t wrote to, since t started. Robustness is a form of program equivalence between two versions of the same program, obtained using two semantics, one more permissive than the other. It ensures that this permissiveness has no effect on the program under consideration. The difficulty in checking robustness is to apprehend the extra behaviors due to the relaxed model w.r.t. the strong model. This requires a priori reasoning about complex order constraints between operations in arbitrarily long computations, which may need maintaining unbounded ordered structures, and make robustness checking hard or even undecidable.

Our first contribution is to show that verifying robustness of transactional programs against snapshot isolation can be reduced in polynomial time to the reachability problem in concurrent programs under sequential consistency (SC). This allows (1) to avoid explicit handling of the snapshots from where transactions read along computations (since this may imply memorizing unbounded information), and (2) to leverage available tools for verifying invariants/reachability problems on concurrent programs. This also implies that the robustness problem is decidable for finite-state programs, PSPACE-complete when the number of sites is fixed, and EXPSPACE-complete otherwise. This is the first result on the decidability and complexity of the problem of verifying robustness in the context of transactional programs. The problem of verifying robustness has been considered in the literature for several models, including eventual and causal consistency [6,10–12,20]. These works provide (over- or under-)approximate analyses for checking robustness, but none of them provides precise (sound and complete) algorithmic verification methods for solving this problem.

Based on this reduction, our second contribution is a proof methodology for establishing robustness which builds on Lipton's reduction theory [18]. We use the theory of movers to establish whether the relaxations allowed by SI are harmless, i.e., they don't introduce new behaviors compared to serializability.

We applied the proposed verification techniques on 10 challenging applications extracted from previous work [2,6,11,14,16,19,24]. These techniques were enough for proving or disproving the robustness of these applications.

Complete proofs and more details can be found in [4].

**Fig. 1.** Examples of non-robust programs illustrating the difference between SI and serializability. *causal dependency* means that a read in a transaction obtains its value from a write in another transaction. *conflict* means that a write in a transaction is not visible to a read in another transaction, but it would affect the read value if it were visible. Here, *happens-before* is the union of the two.

#### **2 Overview**

In this section, we give an overview of our approach for checking robustness against snapshot isolation. While serializability enforces that transactions are atomic and conflicting transactions, i.e., which read or write to a common location, *cannot* commit concurrently, SI [5] allows that conflicting transactions commit in parallel as long as they don't contain a write-write conflict, i.e., write on a common location. Moreover, under SI, each transaction reads from a snapshot of the database taken at its start. These relaxations permit the "anomaly" known as Write Skew (WS) shown in Fig. 1a, where an anomaly is a program execution which is allowed by SI, but not by serializability. The execution of Write Skew under SI allows the reads of x and y to return 0 although this cannot happen under serializability. These values are possible since each transaction is executed locally (starting from the initial snapshot) without observing the writes of the other transaction.

**Execution Trace.** Our notion of program robustness is based on an abstract representation of executions called *trace*. Informally, an execution trace is a set of events, i.e., accesses to shared variables and transaction begin/commit events, along with several standard dependency relations between events recording the data-flow. The transitive closure of the union of all these dependency relations is called *happens-before*. An execution is an anomaly if the happens-before of its trace is cyclic. Figure 1b shows the happens-before of the Write Skew anomaly. Notice that the happens-before order is cyclic in both cases.

Semantically, every transaction execution involves two main events, the issue and the commit. The issue event corresponds to a sequence of reads and/or writes where the writes are visible only to the current transaction. We interpret it as a single event since a transaction starts with a database snapshot that it updates in isolation, without observing other concurrently executing transactions. The commit event is where the writes are propagated and made visible to all processes. Under serializability, the two events coincide, i.e., they are adjacent in the execution. Under SI, this is not the case and in between the issue and the commit of the same transaction, we may have issue/commit events from concurrent transactions. When a transaction commit does not occur immediately after its issue, we say that the underlying transaction is *delayed*. For example, the following execution of WS corresponds to the happens-before cycle in Fig. 1b where the write to <sup>x</sup> was committed after <sup>t</sup><sup>2</sup> finished, hence, <sup>t</sup><sup>1</sup> was delayed:

$$\begin{aligned} \mathsf{begin\texttt{Big}}(p\_1, t\_1) \mathsf{id}(p\_1, t\_1, y, 0) \mathsf{i} \mathsf{u}(p\_1, t\_1, x, 1) & \mathsf{com}(p\_1, t\_1) \\ & \mathsf{begin\texttt{int}}(p\_2, t\_2) \mathsf{id}(p\_2, t\_2, x, 0) \mathsf{i} \mathsf{u}(p\_2, t\_2, y, 1) \mathsf{com}(p\_2, t\_2) \end{aligned}$$

Above, begin(p1, t<sup>1</sup>) stands for starting a new transaction <sup>t</sup><sup>1</sup> by process <sup>p</sup><sup>1</sup>, ld represents read (load) actions, while isu denotes write actions that are visible only to the current transaction (not yet committed). The writes performed during <sup>t</sup><sup>1</sup> become visible to all processes once the commit event com(p1, t<sup>1</sup>) takes place.

**Reducing Robustness to SC Reachability.** The above SI execution can be mimicked by an execution of the same program under serializability modulo an instrumentation that simulates the delayed transaction. The local writes in the issue event are simulated by writes to auxiliary registers and the commit event is replaced by copying the values from the auxiliary registers to the shared variables (actually, it is not necessary to simulate the commit event; we include it here for presentation reasons). The auxiliary registers are visible only to the delayed transaction. In order that the execution be an anomaly (i.e., not possible under serializability without the instrumentation) it is required that the issue and the commit events of the delayed transaction are linked by a chain of happens-before dependencies. For instance, the above execution for WS can be simulated by:

$$\begin{aligned} \mathsf{legin}(p\_1, t\_1) \mathsf{ld}(p\_1, t\_1, y, 0) \mathsf{st}(p\_1, t\_1, r\_x, 1) & \qquad \qquad \qquad \qquad \mathsf{st}(p\_1, t\_1, x, r\_x) \\ \mathsf{legin}(p\_2, t\_2) \mathsf{ld}(p\_2, t\_2, x, 0) \mathsf{las}(p\_2, t\_2, y, 1) \mathsf{com}(p\_2, t\_2) \end{aligned}$$

The write to <sup>x</sup> was delayed by storing the value in the auxiliary register <sup>r</sup><sup>x</sup> and the happens-before chain exists because the read on <sup>y</sup> that was done by <sup>t</sup><sup>1</sup> is conflicting with the write on <sup>y</sup> from <sup>t</sup><sup>2</sup> and the read on <sup>x</sup> by <sup>t</sup><sup>2</sup> is conflicting with the write of x in the simulation of t<sup>1</sup>'s commit event. On the other hand, the following execution of Write-Skew without the read on y in t<sup>1</sup>:

$$\begin{aligned} \mathsf{begin}(p\_1, t\_1) \mathsf{st}(p\_1, t\_1, r\_x, 1) & \mathsf{st}(p\_1, t\_1, x, r\_x) \\ & \mathsf{begin}(p\_2, t\_2) \mathsf{ld}(p\_2, t\_2, x, 0) \mathsf{isu}(p\_2, t\_2, y, 1) \mathsf{com}(p\_2, t\_2) \end{aligned} $$

misses the conflict (happens-before dependency) between the issue event of <sup>t</sup><sup>1</sup> and <sup>t</sup><sup>2</sup>. Therefore, the events of <sup>t</sup><sup>2</sup> can be reordered to the left of <sup>t</sup><sup>1</sup> and obtain an equivalent execution where st(p<sup>1</sup>, t<sup>1</sup>, x, r<sup>x</sup>) occurs immediately after st(p<sup>1</sup>, t<sup>1</sup>, r<sup>x</sup>, 1). In this case, <sup>t</sup><sup>1</sup> is not anymore delayed and this execution is possible under serializability (without the instrumentation).

If the number of transactions to be delayed in order to expose an anomaly is unbounded, the instrumentation described above may need an unbounded number of auxiliary registers. This would make the verification problem hard or even undecidable. However, we show that it is actually enough to delay a single transaction, i.e., a program admits an anomaly under SI iff it admits an anomaly containing a single delayed transaction. This result implies that the number of auxiliary registers needed by the instrumentation is bounded by the number of program variables, and that checking robustness against SI can be reduced in linear time to a reachability problem under serializability (the reachability problem encodes the existence of the chain of happens-before dependencies mentioned above). The proof of this reduction relies on a nontrivial characterization of anomalies.

**Proving Robustness Using Commutativity Dependency Graphs.** Based on the reduction above, we also devise an approximated method for checking robustness based on the concept of mover in Lipton's reduction theory [18]. An event is a left (resp., right) mover if it commutes to the left (resp., right) of another event (from a different process) while preserving the computation. We use the notion of mover to characterize happens-before dependencies between transactions. Roughly, there exists a happens-before dependency between two transactions in some execution if one doesn't commute to the left/right of the other one. We define a commutativity dependency graph which summarizes the happens-before dependencies in all executions of a given program between transactions t as they appear in the program, transactions t \ {w} where the writes of t are deactivated (i.e., their effects are not visible outside the transaction), and

transactions t \ {r} where the reads of t obtain non-deterministic values. The transactions t \ {w} are used to simulate issue events of delayed transactions (where writes are not yet visible) while the transactions t \ {r} are used to simulate commit events of delayed transactions (which only write to the shared memory). Two transactions a

**Fig. 2.** Commutativity dependency graph of WS where the read of *y* is omitted.

and b are linked by an edge iff a *cannot* move to the right of b (or b cannot move to the left of a), or if they are related by the program order (i.e., issued in some order in the same process). Then a program is robust if for every transaction t, this graph *doesn't* contain a path from t \ {w} to t \ {r} formed of transactions that don't write to a variable that t writes to (the latter condition is enforced by SI since two concurrent transactions cannot commit at the same time when they write to a common variable). For example, Fig. 2 shows the commutativity dependency graph of the modified WS program where the read of y is removed from t<sup>1</sup>. The fact that it doesn't contain any path like above implies that it is robust.

#### **3 Programs**

A program is parallel composition of *processes* distinguished using a set of identifiers P. Each process is a sequence of *transactions* and each transaction is a sequence of *labeled instructions*. Each transaction starts with a begin instruction and finishes with a commit instruction. Each other instruction is either an assignment to a process-local *register* from a set R or to a *shared variable* from a set V, or an assume statement. The read/write assignments use values from a data domain <sup>D</sup>. An assignment to a register reg := var is called a *read* of the shared-variable var and an assignment to a shared variable var := reg-expr is called a *write* to var (reg-expr is an expression over registers whose syntax we leave unspecified since it is irrelevant for our development). The assume bexpr blocks the process if the Boolean expression bexpr over registers is false. They are used to model conditionals as usual. We use goto statements to model an arbitrary control-flow where the same label can be assigned to multiple instructions and multiple goto statements can direct the control to the same label which allows to mimic imperative constructs like loops and conditionals. To simplify the technical exposition, our syntax includes simple read/write instructions. However, our results apply as well to instructions that include SQL (select/update) queries. The experiments reported in Sect. 7 consider programs with SQL based transactions.

The semantics of a program under SI is defined as follows. The shared variables are stored in a central memory and each process keeps a replicated copy of the central memory. A process starts a transaction by discarding its local copy and fetching the values of the shared variables from the central memory. When a process commits a transaction, it merges its local copy of the shared variables with the one stored in the central memory in order to make its updates visible to all processes. During the execution of a transaction, the process stores the writes to shared variables only in its local copy and reads only from its local copy. When a process merges its local copy with the centralized one, it is required that there were no concurrent updates that occurred after the last fetch from the central memory to a shared variable that was updated by the current transaction. Otherwise, the transaction is aborted and its effects discarded.

More precisely, the semantics of a program P under SI is defined as a labeled transition system [P]SI where transactions are labeled by the set of events

$$\mathbb{E}\mathbb{V} = \{\mathsf{begin}(p,t), \mathsf{ld}(p,t,x,v), \mathsf{isu}(p,t,x,v), \mathsf{com}(p,t) : p \in \mathbb{P}, t \in \mathbb{T}^2, x \in \mathbb{V}, v \in \mathbb{D}\}$$

where begin and com label transitions corresponding to the start and the commit of a transaction, respectively. isu and ld label transitions corresponding to writing, resp., reading, a shared variable during some transaction.

An execution of program P, under snapshot isolation, is a sequence of events *ev* <sup>1</sup> · *ev* <sup>2</sup> · ... corresponding to a run of [P]CM. The set of executions of <sup>P</sup> under SI is denoted by <sup>E</sup>xSI(P).

#### **4 Robustness Against** SI

A *trace* abstracts the order in which shared-variables are accessed inside a transaction and the order between transactions accessing different variables. Formally, the trace of an execution ρ is obtained by (1) replacing each sub-sequence of transitions in ρ corresponding to the same transaction, but excluding the com transition, with a single "macro-event" isu(p, t), and (2) adding several standard relations between these macro-events isu(p, t) and commit events com(p, t) to record the data-flow in ρ, e.g. which transaction wrote the value read by another transaction. The sequence of isu(p, t) and com(p, t) events obtained in the first step is called a *summary of* ρ. We say that a transaction t in ρ performs an *external read* of a variable x if ρ contains an event ld(p, t, x, v) which is not preceded by a write on x of t, i.e., an event isu(p, t, x, v). Also, we say that a transaction t *writes* a variable x if ρ contains an event isu(p, t, x, v), for some v.

The *trace* tr(ρ)=(τ, PO, WR, WW, RW, STO) of an execution ρ consists of the summary τ of ρ along with the *program order* PO, which relates any two issue events isu(p, t) and isu(p, t ) that occur in this order in τ , *write-read* relation WR (also called *read-from*), which relates any two events com(p, t) and isu(p , t ) that occur in this order in τ such that t performs an external read of x, and com(p, t) is the last event in τ before isu(p , t ) that writes to x (to mark the variable x, we may use WR(x)), the *write-write* order WW (also called storeorder), which relates any two store events com(p, t) and com(p , t ) that occur in this order in τ and write to the same variable x (to mark the variable x, we may use WW(x)), the *read-write* relation RW (also called *conflict*), which relates any two events isu(p, t) and com(p , t ) that occur in this order in τ such that t reads a value that is overwritten by t , and the *same-transaction* relation STO, which relates the issue event with the commit event of the same transaction. The read-write relation RW is formally defined as RW(x) = WR−<sup>1</sup>(x); WW(x) (we use ; to denote the standard composition of relations) and RW = - <sup>x</sup>∈<sup>V</sup> RW(x). If a

transaction t reads the initial value of x then RW(x) relates isu(p, t) to com(p , t ) of any other transaction t which writes to x (i.e., (isu(p, t), com(p , t )) <sup>∈</sup> RW(x)) (note that in the above relations, p and p might designate the same process).

Since we reason about only one trace at a time, to simplify the writing, we may say that a trace is simply a sequence τ as above, keeping the relations PO, WR, WW, RW, and STO implicit. The set of traces of executions of a program <sup>P</sup> under SI is denoted by <sup>T</sup>r(P)SI.

**Serializability Semantics.** The semantics of a program under serializability can be defined using a transition system where the configurations keep a single shared-variable valuation (accessed by all processes) with the standard interpretation of read and write statements. Each transaction executes in isolation. Alternatively, the serializability semantics can be defined as a restriction of [P]SI to the set of executions where each transaction is *immediately* delivered when it starts, i.e., the start and commit time of transaction coincide t.st <sup>=</sup> t.ct. Such executions are called *serializable* and the set of serializable executions of a program <sup>P</sup> is denoted by <sup>E</sup>xSER(P). The latter definition is easier to reason about when relating executions under snapshot isolation and serializability, respectively.

**Serializable Trace.** A trace *tr* is called *serializable* if it is the trace of a serializable execution. Let <sup>T</sup>rSER(P) denote the set of serializable traces. Given a serializable trace *tr* = (τ, PO, WR, WW, RW, STO) we have that every event isu(p, t) in τ is immediately followed by the corresponding com(p, t) event.

**Happens Before Order.** Since multiple executions may have the same trace, it is possible that an execution ρ produced by snapshot isolation has a serializable trace tr(ρ) even though isu(p, t) events may not be immediately followed by com(p, t) actions. However, ρ would be equivalent, up to reordering of "independent" (or commutative) transitions, to a serializable execution. To check whether the trace of an execution is serializable, we introduce the *happens-before* relation on the events of a given trace as the transitive closure of the union of all the relations in the trace, i.e., HB = (PO <sup>∪</sup> WW <sup>∪</sup> WR <sup>∪</sup> RW <sup>∪</sup> STO)<sup>+</sup>.

Finally, the happens-before relation between events is extended to transactions as follows: a transaction <sup>t</sup><sup>1</sup> *happens-before* another transaction <sup>t</sup><sup>2</sup> <sup>=</sup> <sup>t</sup><sup>1</sup> if the trace *tr* contains an event of transaction <sup>t</sup><sup>1</sup> which happens-before an event of <sup>t</sup><sup>2</sup>. The happens-before relation between transactions is denoted by HB<sup>t</sup> and called *transactional happens-before*. The following characterizes serializable traces.

**Theorem 1 (**[1,23]**).** *A trace tr is serializable iff* HB<sup>t</sup> *is acyclic.*

A program is called robust if it produces the same set of traces as the serializability semantics.

**Definition 1.** *A program* <sup>P</sup> *is called* robust *against* SI *iff* <sup>T</sup>*r*SI(P) = <sup>T</sup>*r*SER(P)*.*

Since <sup>T</sup>rSER(P) <sup>⊆</sup> <sup>T</sup>rX(P), the problem of checking robustness of a program <sup>P</sup> is reduced to checking whether there exists a trace *tr* <sup>∈</sup> <sup>T</sup>rSI(P) \ <sup>T</sup>rSER(P).

#### **5 Reducing Robustness Against** SI **to SC Reachability**

A trace which is not serializable must contain at least an issue and a commit event of the same transaction that don't occur one after the other even after reordering of "independent" events. Thus, there must exist an event that occur between the two which is related to both events via the happens-before relation, forbidding the issue and commit to be adjacent. Otherwise, we can build another trace with the same happens-before where events are reordered such that the issue is immediately followed by the corresponding commit. The latter is a serializable trace which contradicts the initial assumption. We define a program instrumentation which mimics the delay of transactions by doing the writes on auxiliary variables which are not visible to other transactions. After the delay of a transaction, we track happens-before dependencies until we execute a transaction that does a "read" on one of the variables that the delayed transaction writes to (this would expose a read-write dependency to the commit event of the delayed transaction). While tracking happens-before dependencies we cannot execute a transaction that writes to a variable that the delayed transaction writes to since SI forbids write-write conflicts between concurrent transactions.

Concretely, given a program P, we define an instrumentation of P such that P is not robust against SI iff the instrumentation reaches an error state under serializability. The instrumentation uses auxiliary variables in order to simulate a *single* delayed transaction which we prove that it is enough for deciding robustness. Let isu(p, t) be the issue event of the only delayed transaction. The process p that delayed t is called the *Attacker*. When the attacker finishes executing the delayed transaction it stops. Other processes that execute transactions afterwards are called *Happens-Before Helpers*.

The instrumentation uses two copies of the set of shared variables in the original program to simulate the delayed transaction. We use primed variables x to denote the second copy. Thus, when a process becomes the attacker, it will only write to the second copy that is not visible to other processes including the happens-before helpers. The writes made by the other processes including the happens-before helpers are made visible to all processes.

When the attacker delays the transaction t, it keeps track of the variables it accessed, in particular, it stores the name of one of the variables it writes to, x, it tracks every variable y that it reads from and every variable z that it writes to. When the attacker finishes executing t, and some other process wants to execute some other transaction, the underlying transaction must contain a write to a variable y that the attacker reads from. Also, the underlying transaction must not write to a variable that t writes to. We say that this process has joined happens-before helpers through the underlying transaction. While executing this transaction, we keep track of each variable that was accessed and the type of operation, whether it is a read or write. Afterward, in order for some other transaction to "join" the happens-before path, it must not write to a variable that t writes to so it does not violate the fact that SI forbids write-write conflicts, and it has to satisfy one of the following conditions in order to ensure the continuity of the happens-before dependencies: (1) the transaction is issued by a process that has already another transaction in the happens-before dependency (program order dependency), (2) the transaction is reading from a shared variable that was updated by a previous transaction in the happens-before dependency (write-read dependency), (3) the transaction writes to a shared variable that was read by a previous transaction in the happens-before dependency (readwrite dependency), or (4) the transaction writes to a shared variable that was updated by a previous transaction in the happens-before dependency (writewrite dependency). We introduce a flag for each shared variable to mark the fact that the variable was read or written by a previous transaction.

Processes continue executing transactions as part of the chain of happensbefore dependencies, until a transaction does a read on the variable x that t wrote to. In this case, we reached an error state which signals that we found a cycle in the transactional happens-before relation.

The instrumentation uses four varieties of flags: a) global flags (i.e., HB, <sup>a</sup>trA , <sup>a</sup>stA ), b) flags local to a process (i.e., p.a and p.hbh), and c) flags per shared variable (i.e., x.event, x.event , and x.eventI). We will explain the meaning of these flags along with the instrumentation. At the start of the execution, all flags are initialized to null (⊥).

Whether a process is an attacker or happens-before helper is not enforced syntactically by the instrumentation. It is set non-deterministically during the execution using some additional process-local flags. Each process chooses to set to true at most one of the flags p.a and p.hbh, implying that the process becomes an attacker or happens-before helper, respectively. At most one process can be an attacker, i.e., set p.a to true. In the following, we detail the instrumentation for read and write instructions of the attacker and happens-before helpers.

#### **5.1 Instrumentation of the Attacker**

Figure 3 lists the instrumentation of the write and read instructions of the attacker. Each process passes through an initial phase where it executes transactions that are visible immediately to all the other processes (i.e., they are not delayed), and then non-deterministically it can choose to delay a transaction at which point it sets the flag <sup>a</sup>trA to true. During the delayed transaction it chooses non-deterministically a write instruction to a variable x and stores the name of this variable in the flag <sup>a</sup>stA (line (5)). The values written during the delayed transaction are stored in the primed variables and are visible only to the current transaction, in case the transaction reads its own writes. For example, given a variable z, all writes to z from the original program are transformed into writes to the primed version z (line (3)). Each time, the attacker writes to <sup>z</sup>, it sets the flag z.event = 1. This flag is used later by transactions from happens-before helpers to avoid writing to variables that the delayed transaction writes to.


**Fig. 3.** Instrumentation of the Attacker. We use '*x* to denote the name of the shared variable *x*.

A read on a variable, y, in the delayed transaction takes her value from the primed version, y . In every read in the delayed transaction, we set the flag y.event to ld (line (1)) to be used latter in order for a process to join the happensbefore helpers. Afterward, the attacker starts the happens-before path, and it sets the variable HB to true (line (2)) to mark the start of the happens. When the flag HB is set to true the attacker stops executing new transactions.

#### **5.2 Instrumentation of the Happens-Before Helpers**

The remaining processes, which are not the attacker, can become a happensbefore helper. Figure 4 lists the instrumentation of write and read instructions of a happens-before helper. In a first phase, each process executes the original code until the flag <sup>a</sup>trA is set to true by the attacker. This flag signals the "creation" of the secondary copy of the shared-variables, which can be observed only by the attacker. At this point, the flag HB is set to true, and the happens-before helper process chooses non-deterministically a first transaction through which it wants to join the set of happens-before helpers, i.e., continue the happens-before dependency created by the existing happens-before helpers. When a process chooses a transaction, it makes a pledge (while executing the begin instruction) that during this transaction it will either read from a variable that was written to by another happens-before helper, write to a variable that was accessed (read or written) by another happens-before helper, or write to a variable that was read from in the delayed transaction. When the pledge is met, the process sets the flag p.hbh to true (lines (7) and (11)). The execution is blocked if a process does not keep its pledge (i.e., the flag p.hbh is null) at the end of the transaction. Note that the first process to join the happens-before helper has to execute a transaction t which writes to a variable that was read from in the delayed transaction since this is the only way to build a happens-before between t, and the delayed transaction (PO is not possible since t is not from the attacker, WR is not possible since t does not see the writes of the delayed transaction, and WW is not possible since t cannot write to a variable that the delayed transaction writes to). We use a flag x.event for each variable x to record the type (read ld or write st) of the last access made by a happens-before helper (lines (8) and (10)). During the execution of a transaction that is part of the happens-before dependency, we must ensure that the transaction does not write to variable y where y.even is set to 1. Otherwise, the execution is blocked (line 9).

The happens-before helpers continue executing their instructions, until one of them reads from the shared variable <sup>x</sup> whose name was stored in <sup>a</sup>stA . This establishes a happens-before dependency between the delayed transaction and a "fictitious" store event corresponding to the delayed transaction that could be executed just after this read of x. The execution doesn't have to contain this store event explicitly since it is always enabled. Therefore, at the end of every transaction, the instrumentation checks whether the transaction read x. If it is the case, then the execution stops and goes to an error state to indicate that this is a robustness violation. Notice that after the attacker stops, the only processes that are executing transactions are happens-before helpers, which is justified since when a process is not from a happens-before helper it implies that we cannot construct a happens-before dependency between a transaction of this process and the delayed transaction which means that the two transactions commute which in turn implies that this process's transactions can be executed before executing the delayed transaction of the attacker.

#### **5.3 Correctness**

The role of a process in an execution is chosen non-deterministically at runtime. Therefore, the final instrumentation of a given program P, denoted by [[P]], is obtained by replacing each labeled instruction linst with the concatenation of the instrumentations corresponding to the attacker and the happens-before helpers, i.e., [[linst]] ::= [[linst]]<sup>A</sup> [[linst]]HbH

The following theorem states the correctness of the instrumentation.

#### **Theorem 2.** P *is not robust against* SI *iff* [[P]] *reaches the error state.*

If a program is not robust, this implies that the execution of the program under SI results in a trace where the happens-before is cyclic. Which is possible only if the program contains at least one delayed transaction. In the proof of this theorem, we show that is sufficient to search for executions that contain a single delayed transaction.

Notice that in the instrumentation of the attacker, the delayed transaction must contain a read and write instructions on different variables. Also, the transactions of the happens-before helpers must not contain a write to a variable that the delayed transaction writes to. The following corollary states the complexity of checking robustness for finite-state programs<sup>1</sup> against snapshot isolation. It is a direct consequence of Theorem 2 and of previous results concerning the reachability problem in concurrent programs running over a sequentially-consistent memory, with a fixed [17] or parametric number of processes [22].


**Fig. 4.** Instrumentation of happens-before helpers.

<sup>1</sup> Programs with a bounded number of variables taking values from a bounded domain.

**Corollary 1.** *Checking robustness of finite-state programs against snapshot isolation is PSPACE-complete when the number of processes is fixed and EXPSPACE-complete, otherwise.*

The instrumentation can be extended to SQL (select/update) queries where a statement may include expressions over a finite/infinite set of variables, e.g., by manipulating a set of flags x.event for each statement instead of only one.

#### **6 Proving Program Robustness**

As a more pragmatic alternative to the reduction in the previous section, we define an approximated method for proving robustness which is inspired by Lipton's reduction theory [18].

**Movers.** Given an execution <sup>τ</sup> <sup>=</sup> *ev* <sup>1</sup>·...·*ev* <sup>n</sup> of a program <sup>P</sup> under serializability (where each event *ev*<sup>i</sup> corresponds to executing an entire transaction), we say that the event *ev*<sup>i</sup> *moves right (resp., left)* in <sup>τ</sup> if *ev* <sup>1</sup> ·...·*ev*<sup>i</sup>−<sup>1</sup> ·*ev*i+1 ·*ev*<sup>i</sup> ·*ev*i+2 · ...·*ev* <sup>n</sup> (resp., *ev* <sup>1</sup> ·...·*ev*<sup>i</sup>−<sup>2</sup> ·*ev*<sup>i</sup> ·*ev*<sup>i</sup>−<sup>1</sup> ·*ev*i+1 ·...·*ev* <sup>n</sup>) is also a valid execution of P, the process of *ev*<sup>i</sup> is different from the process of *ev*i+1 (resp., *ev*<sup>i</sup>−<sup>1</sup>), and both executions reach to the same end state <sup>σ</sup><sup>n</sup>. For an execution <sup>τ</sup> , let instOf<sup>τ</sup> (*ev*i) denote the transaction that generated the event *ev*i. A transaction <sup>t</sup> of a program <sup>P</sup> is a *right (resp., left) mover* if for all executions τ of <sup>P</sup> under serializability, the event *ev*<sup>i</sup> with instOf(*ev*i) = <sup>t</sup> moves right (resp., left) in <sup>τ</sup> .

If a transaction t is not a right mover, then there must exist an execution τ of <sup>P</sup> under serializability and an event *ev*<sup>i</sup> of <sup>τ</sup> with instOf(*ev*i) = <sup>t</sup> that does not move right. This implies that there must exist another *ev*i+1 of <sup>τ</sup> which caused *ev*<sup>i</sup> to not be a right mover. Since *ev*<sup>i</sup> and *ev*i+1 do not commute, then this must be because of either a write-read, write-write, or a read-write dependency. If t <sup>=</sup> instOf(*ev*i+1), we say that t is not a right mover because of t and some dependency that is either write-read, write-write, or read-write. Notice that when t is not a right mover because of t then t is not a left mover because of t.

We define <sup>M</sup>WR as a binary relation between transactions such that (t, t ) ∈ <sup>M</sup>WR when <sup>t</sup> is *not* a right mover because of <sup>t</sup> and a write-read dependency. We define the relations MWW and MRW corresponding to write-write and read-write dependencies in a similar way.

**Read/Write-free Transactions.** Given a transaction t, we define t \ {r} as a variation of t where all the reads from shared variables are replaced with nondeterministic reads, i.e., reg := var statements are replaced with reg := where denotes non-deterministic choice. We also define t\{w} as a variation of t where all the writes to shared variables in t are disabled. Intuitively, recalling the reduction to SC reachability in Sect. 5, t\{w} simulates the delay of a transaction by the Attacker, i.e., the writes are not made visible to other processes, and t\{r} approximates the commit of the delayed transaction which only applies a set of writes.

**Commutativity Dependency Graph.** Given a program P, we define the commutativity dependency graph as a graph where vertices represent transactions and their read/write-free variations. Two vertices which correspond to the original transactions in P are related by a program order edge, if they belong to the same process. The other edges in this graph represent the "non-mover" relations MWR, MWW, and MRW.

Given a program P, we say that the commutativity dependency graph of P contains a *non-mover cycle* if there exist a set of transactions <sup>t</sup><sup>0</sup>, t<sup>1</sup>,...,t<sup>n</sup> of <sup>P</sup> such that the following hold:


A non-mover cycle approximates an execution of the instrumentation defined in Sect. <sup>5</sup> in between the moment that the Attacker delays a transaction <sup>t</sup><sup>0</sup> (which here corresponds to the write-free variation t <sup>0</sup> ) and the moment where <sup>t</sup><sup>0</sup> gets committed (the read-free variation t 0).

The following theorem shows that the acyclicity of the commutativity dependency graph of a program implies the robustness of the program. Actually, the notion of robustness in this theorem relies on a slightly different notion of trace where store-order and write-order dependencies take into account values, i.e., store-order relates only writes writing different values and the write-order relates a read to the oldest write (w.r.t. execution order) writing its value. This relaxation helps in avoiding some harmless robustness violations due to for instance, two transactions writing the same value to some variable.

**Theorem 3.** *For a program* P*, if the commutativity dependency graph of* P *does not contain non-mover cycles, then* P *is robust.*

#### **7 Experiments**

To test the applicability of our robustness checking algorithms, we have considered a benchmark of 10 applications extracted from the literature related to weakly consistent databases in general. A first set of applications are open source projects that were implemented to be run over the Cassandra database, extracted from [11]. The second set of applications is composed of: TPC-C [24], an on-line transaction processing benchmark widely used in the database community, Small-Bank, a simplified representation of a banking application [2], FusionTicket, a movie ticketing application [16], Auction, an online auction application [6], and Courseware, a course registration service extracted from [14,19].


**Table 1.** An overview of the analysis results. CDG stands for commutativity dependency graph. The columns PO and PT show the number of proof obligations and proof time in second, respectively. T stands for trivial when the application has only read-only transactions.

A first experiment concerns the reduction of robustness checking to SC reachability. For each application, we have constructed a client (i.e., a program composed of transactions defined within that application) with a fixed number of processes (at most 3) and a fixed number of transactions (between 3 and 7 transactions per process). We have encoded the instrumentation of this client, defined in Sect. 5, in the Boogie programming language [3] and used the Civl verifier [15] in order to check whether the assertions introduced by the instrumentation are violated (which would represent a robustness violation). Note that since clients are of fixed size, this requires no additional assertions/invariants (it is an instance of bounded model checking). The results are reported in Table 1. We have found two of the applications, Courseware and SmallBank, to *not* be robust against snapshot isolation. The violation in Courseware is caused by transactions RemoveCourse and EnrollStudent that execute concurrently, RemoveCourse removing a course that has no registered student and EnrollStudent registering a new student to the same course. We get an invalid state where a student is registered for a course that was removed. SmallBank's violation contains transactions Balance, TransactSaving, and WriteCheck. One process executes WriteCheck where it withdraws an amount from the checking account after checking that the sum of the checking and savings accounts is bigger than this amount. Concurrently, a second process executes TransactSaving where it withdraws an amount from the saving account after checking that it is smaller than the amount in the savings account. Afterwards, the second process checks the contents of both the checking and saving accounts. We get an invalid state where the sum of the checking and savings accounts is negative.

Since in the first experiment we consider fixed clients, the lack of assertion violations doesn't imply that the application is robust (this instantiation of our reduction can only be used to reveal robustness violations). Thus, a second experiment concerns the robustness proof method based on commutativity dependency graphs (Sect. 6). For the applications that were not identified as non-robust by the previous method, we have used Civl to construct their commutativity dependency graphs, i.e., identify the "non-mover" relations MWR, MWW, and MRW (Civl allows to check whether some code fragment is a left/right mover). In all cases, the graph didn't contain non-mover cycles, which allows to conclude that the applications are robust.

The experiments show that our results can be used for finding violations and proving robustness, and that they apply to a large set of interesting examples. Note that the reduction to SC and the proof method based on commutativity dependency graphs are valid for programs with SQL (select/update) queries.

#### **8 Related Work**

Decidability and complexity of robustness has been investigated in the context of relaxed memory models such as TSO and Power [7,9,13]. Our work borrows some high-level principles from [7] which addresses the robustness against TSO. We reuse the high-level methodology of characterizing minimal violations according to some measure and defining reductions to SC reachability using a program instrumentation. Instantiating this methodology in our context is however very different, several fundamental differences being:


Other works [9,13] define decision procedures which are based on the theory of regular languages and do not extend to infinite-state programs like in our case.

As far as we know, our work provides the first results concerning the decidability and the complexity of robustness checking in the context of transactions. The existing work on the verification of robustness for transactional programs

$$\begin{array}{rcl} \mathtt{p1}: & \quad \mathtt{p2}:\\ \mathtt{t1}: & \quad \mathtt{[\ \mathtt{if}\ (\mathtt{x} > \mathtt{y}) \ \mathtt{while}\ \mathtt{y} > \mathtt{x}]} \\ & \mathtt{r1} = \mathtt{x} - \mathtt{y} \ \mathtt{[\ \mathtt{)}} \\ & \mathtt{x} = \mathtt{y} \ \mathtt{[\ \mathtt{x} = \mathtt{x} \ \mathtt{int}]} \end{array} \qquad \begin{array}{rcl} \mathtt{p2}:\\ \mathtt{i} \mathtt{if}\ (\mathtt{y} > \mathtt{x}) \\ & \mathtt{r2} = \mathtt{y} - \mathtt{x} \\ & \mathtt{y} = \mathtt{x} \ \mathtt{int} \end{array}$$

#### **Fig. 5.** A robust program.

provide either over- or under-approximate analyses. Our commutativity dependency graphs are similar to the static dependency graphs used in [6,10–12], but they are more precise, i.e., reducing the number of false alarms. The static dependency graphs record happens-before dependencies between transactions based on a syntactic approximation of the variables accessed by a transaction. For example, our techniques are able to prove that the program in Fig. 5 is robust, while this is not possible using static dependency graphs. The latter would contain a dependency from transaction <sup>t</sup><sup>1</sup> to <sup>t</sup><sup>2</sup> and one from <sup>t</sup><sup>2</sup> to <sup>t</sup><sup>1</sup> just because syntactically, each of the two transactions reads both variables and may write to one of them. Our dependency graphs take into account the semantics of these transactions and do not include this happens-before cycle. Other overand under-approximate analyses have been proposed in [20]. They are based on encoding executions into first order logic, bounded-model checking for the under-approximate analysis, and a sound check for proving a cut-off bound on the size of the happens-before cycles possible in the executions of a program, for the over-approximate analysis. The latter is strictly less precise than our method based on commutativity dependency graphs. For instance, extending the TPC-C application with additional transactions will make the method in [20] fail while our method will succeed in proving robustness (the three transactions are for adding a new product, adding a new warehouse based on the number of customers and warehouses, and adding a new customer, respectively).

Finally, the idea of using Lipton's reduction theory for checking robustness has been also used in the context of the TSO memory model [8], but the techniques are completely different, e.g., the TSO technique considers each update in isolation and doesn't consider non-mover cycles like in our commutativity dependency graphs.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Efficient Verification of Network Fault Tolerance via Counterexample-Guided Refinement**

Nick Giannarakis1(B), Ryan Beckett<sup>2</sup>, Ratul Mahajan3,4, and David Walker<sup>1</sup>

<sup>1</sup> Princeton University, Princeton, NJ 08544, USA *{*ng8,dpw*}*@cs.princeton.edu <sup>2</sup> Microsoft Research, Redmond, WA 98052, USA ryan.beckett@microsoft.com <sup>3</sup> University of Washington, Seattle, WA 98195, USA ratul@cs.washington.edu

<sup>4</sup> Intentionet, Seattle, WA, USA

**Abstract.** We show how to verify that large data center networks satisfy key properties such as all-pairs reachability under a bounded number of faults. To scale the analysis, we develop algorithms that identify network symmetries and compute small abstract networks from large concrete ones. Using counter-example guided abstraction refinement, we successively refine the computed abstractions until the given property may be verified. The soundness of our approach relies on a novel notion of network approximation: routing paths in the concrete network are not precisely simulated by those in the abstract network but are guaranteed to be "at least as good." We implement our algorithms in a tool called Origami and use them to verify reachability under faults for standard data center topologies. We find that Origami computes abstract networks with 1–3 orders of magnitude fewer edges, which makes it possible to verify large networks that are out of reach of existing techniques.

#### **1 Introduction**

Most networks decide how to route packets from point A to B by executing one or more distributed routing protocols such as the Border Gateway Protocol (BGP) and Open Shortest Path First (OSPF). To achieve end-to-end policy objectives related to cost, load balancing, security, etc., network operators author configurations for each router. These configurations control various aspects of the route computation such as filtering and ranking route information received from neighbors, information injection from one protocol to another, and so on.

This work was supported in part by NSF Grants 1703493 and 1837030, and gifts from Cisco and Facebook. Any opinions, findings, and conclusions expressed are those of the authors and do not necessarily reflect those of the NSF, Cisco or Facebook.

This flexibility, however, comes at a cost: Configuring individual routers to enforce the desired policies of the distributed system is complex and errorprone [15,21]. The problem of configuration is further compounded by three challenges. The first is network scale. Large networks such as those of cloud providers can consist of millions of lines of configuration spread across thousands of devices. The second is that operators must account for the interaction with external neighbors who may sent arbitrary routing messages. Finally one has to deal with *failures*. Hardware failures are common [14] and lead to a combinatorial explosion of different possible network behaviors.

To combat the complexity of distributed routing configurations, researchers have suggested a wide range of network verification [2,13,25] and simulation [11,12,23] techniques. These techniques are effective on small and medium-sized networks, but they cannot analyze data centers with 1000 s of routers and all their possible failures. To enable scalable analyses, it seems necessary to exploit the symmetries that exist in most large real networks. Indeed, other researchers have exploited symmetries to scale verification in the past [3,22]. However, it has never been possible to account for failures, as they introduce asymmetries that change routing behaviors in unpredictable ways.

To address this challenge, we develop a new algorithm for verifying reachability in networks in the presence of faults, based on the idea of counterexampleguided abstraction refinement (CEGAR) [5]. The algorithm starts by factoring out symmetries using techniques developed in prior work [3] and then attempts verification of the abstract network using an SMT solver. If verification succeeds, we are done. However, if verification fails, we examine the counter-example to decide whether we have a true failure or we must refine the network further and attempt verification anew. By focusing on reachability, the refinement procedure can be accelerated by using efficient graph algorithms, such as min cut, to rule out invalid abstractions in the middle of the CEGAR loop.

We prove the correctness of our algorithm using a new theory of faulty networks that accounts for the impact of all combinations of *k* failures. Our key insight is that, while routes computed in the abstract network may not simulate those of the concrete network exactly, under the right conditions they are guaranteed to *approximate* them. The approximation relation between concrete and abstract networks suffices to verify key properties such as reachability.

We implemented our algorithms in a tool called Origami and measured their performance on common data center network topologies. We find that Origami computes abstract networks with 1–3 orders of magnitude fewer edges. This reduction speeds verification dramatically and enables verification of networks that are out of reach of current state-of-the-art tools [2].

## **2 Key Ideas**

The goal of Origami is to speed up network verification in the presence of faults, and it does so by computing small, abstract networks with *similar* behavior to a given concrete network.

**Fig. 1.** All graph edges shown correspond to edges in the network topology, and we draw edges as directed to denote the direction of forwarding eventually determined for each node by the distributed routing protocols for a fixed destination *d*. In (a) nodes use shortest path routing to route to the destination *d*. (b) shows a compressed network that precisely captures the forwarding behavior of (a). (c) shows how forwarding is impacted by a link failure, shown as a red line. (d) shows a compressed network that is sound approximation of the original network for any single link failure. (Color figure online)

As a first approximation, one can view a network as a directed graph capturing the physical topology, and its routing solution as a subgraph where the remaining edges denote the forwarding decision at each node for some fixed destination. In the absence of faults, given a concrete and abstract network, one can define a natural notion of similarity as a graph homomorphism: assigning each concrete node a corresponding abstract node such that, for any solution to the routing problem, the concrete node forwards "in the same direction" as the corresponding abstract node. For example, the concrete network in Fig. 1a is related to its abstract counterpart in Fig. 1b according to the node colors.

Unfortunately, we run into two significant problems when defining abstractions in this manner in the presence of faults. First, the concrete nodes of Fig. 1a have at least 2 disjoint paths to the destination whereas abstract nodes of Fig. 1b have just one path to the destination, so the abstract network does not preserve the desired fault tolerance properties. Second, consider Fig. 1c, which illustrates how the routing decisions change when a failure occurs. Here, the nodes (*b*<sup>1</sup> in particular) no longer route "in the same direction" as the original network or its abstraction. Hence the invariant connecting concrete and abstract networks is violated.

**Lossy Compression.** To achieve compression given a bounded number of link failures, we relax the notion of similarity between concrete and abstract nodes: A node in the abstract network may merely *approximate* the behavior of concrete nodes. This makes it possible to compress nodes that, in the presence of failures, may route differently. In general, when we fail a single link in the abstract network, we are over-approximating the failures in the concrete network by failing multiple concrete links, possibly more than desired. Nevertheless, the paths taken in the concrete network can only deviate so much from the paths found in the abstract network:

*Property 1.* If a node has a route to the destination in the presence of *k* link failures then it has a route that is "at least as good" (as prescribed by the routing protocol) in the presence of *k* link failures for *k < k*.

This relation suffices to verify important network reliability properties, such as reachability, in the presence of faults. Just as importantly, it allows us to achieve effective network compression to scale verification. -

Revisiting our example, consider the new abstract network of Fig. 1d. When the link between *b*<sup>12</sup> and *d* has failed, *b*<sup>12</sup> still captures the behavior of *b*<sup>1</sup> precisely. However, *b*<sup>2</sup> has a better (in this case better means shorter) path to *d*. Despite this difference, if the operator's goal was to prove reachability to the destination under any single fault, then this abstract network suffices.

**From Specification to Algorithm.** It is not too difficult to find abstract networks that approximate a concrete network; the challenge is finding a valid abstract network that is *small enough* to make verification feasible and yet *large enough* to include sufficiently many paths to verify the given fault tolerance property. Rather than attempting to compute a single abstract network with the right properties all in one shot, we search the space of abstract networks using an algorithm based on *counter-example guided abstraction refinement* [5].

The CEGAR algorithm begins by computing the smallest possible valid abstract network. In the example above, this corresponds to the original compressed network in Fig. 1b, which faithfully approximates the original network when there are no link failures. However, if we try to verify reachability in the presence of a single fault, we will conclude that nodes *b* and *a* have no route to the destination when the link between *b* and *d* fails. The counterexample due to this failure could of course be spurious (and indeed it is). Fortunately, we can easily distinguish whether such a failure is due to lack of connectivity or an artifact of over-abstracting, by calculating the number of corresponding concrete failures. In this example a failure on the link -*b, d* corresponds to 3 concrete failures. Since we are interested in verifying reachability for a single failure this cannot constitute an actual counterexample.

The next step is to *refine* our abstraction by splitting some of the abstract nodes. The idea is to use the counterexample from the previous iteration to split the abstract network in a way that avoids giving rise to the same spurious counterexample in the next iteration (Sect. 5). Doing so results in the somewhat larger network of Fig. 1d. A second verification pass over this larger network takes longer, but succeeds.

#### **3 The Network Model**

Though there are a wide variety of routing protocols in use today, they share a lot in common. Griffin *et al.* [16] showed that protocols like BGP and others solve instances of the *stable paths problem*, a generalization of the shortest paths problem, and Sobrinho [24] demonstrated their semantics and properties can be modelled using routing algebras. We extend these foundations by defining *stable paths problems with faults* (SPPFs), an extension of the classic Stable Paths Problem that admits the possibility of a bounded number of link failures. In later sections, we use this network model to develop generic network compression algorithms and reason about their correctness.

**Stable Path Problems with Faults (SPPF**s**):** An SPPF is an instance of the stable paths problem with faults. Informally, each instance defines the routing behavior of an operational network. The definition includes both the network topology as well as the routing policy. The policy specifies the way routing messages are transformed as they travel along links and through the user-configured import and export filters/transformers of the devices, and also how the preferred routes are chosen at a given device. In our formulation, each problem instance also incorporates a specification of the possible failures and their impact on the routing solutions.

Formally, an SPPF is a tuple with six components:


**Examples:** By choosing an appropriate set of routing attributes, a preference relation and a transfer function, one can model the semantics of commonly used routing protocols. For instance, the Routing Information Protocol (RIP) is a simple shortest paths protocol. It can be modelled by an SPPF where (1) the set of attributes *A* is the set of integers between 0 and 15 (*i.e.*, the set of permitted path lengths), (2) the preference relation is integer inequality so shorter paths are preferred, and (3) the transfer function increments the received attribute by 1 or drops the route if it exceeds the maximum hop count of 15: trans(*e, a*) =

$$\mathsf{trans}(e,a) = \begin{cases} \infty & \text{if } \ a \ge 15\\ a+1 & \text{otherwise} \end{cases}$$

Going beyond simple shortest paths, BGP is a complex, policy-driven protocol that drives the Internet, and increasingly, data centers [18]. Operators often choose BGP due to its high expressiveness. We can model a version of BGP (simplified for presentation) using messages consisting of triples (LP*,* Comm*,*Path) where LP is an integer-valued local preference, Comm is a set of community values (which are essentially string tags) and Path is a list of nodes, representing the path a routing message has traversed. The transfer function always adds the current device to the Path (or drops the message if a loop is detected) and will modify the LP and Comm components of the attribute according to the device configuration. For instance, one device may attach a community tag to a route and another device may filter or modify routes that have the tag attached. The protocol semantics dictates the preference relation (preferring routes with higher local preference first, and shorter paths second). A more complete BGP model is not fundamentally harder to model—it simply has additional attribute fields and more complex transfer and preference relations [20].

**SPPF Solutions:** In a network, routers will repeatedly exchange messages, applying their transfer functions to neighbor routes and selecting a current best route based on the preference relation, until the network reaches a fixpoint (stable state). Interestingly, Griffin *et al.* [16] showed that all routing solutions can be described via a set of local stability constraints. We exploit this insight to define a series of logical constraints that capture all possible routing behaviors in a setting that includes link failures. More specifically, we define a *solution* (*aka*, *stable state*) S of an SPPF to be a pair -L*,* F of a labelling L and a failure scenario F. The labelling L is an assignment of the final attributes to nodes in the network. If an attribute *a* is assigned to node *v*, we say that node has selected (or prefers) that attribute over other attributes available to it. The chosen route also determines packet forwarding. If a node *X* selects a route from neighbor *Y* , then *X* will forward packets to *Y* . The failure scenario F is an assignment of 0 (has not failed) or 1 (has failed) to each edge in the network.

A solution S = -L*, F* to an SPPF = (*G, A, a*d*,* ≺*,*trans*, k*) is a stable state satisfying the following conditions: ⎧⎪⎨

$$\mathcal{L}(u) = \begin{cases} a\_{\mathrm{d}} & u = d \\ \infty & \mathsf{choices}(u) = \emptyset \\ \min\_{\prec} (\{a \mid (e, a) \in \mathsf{choices}\_{\mathcal{S}}(u)\}) & \mathsf{choices}\_{\mathcal{S}}(u) \neq \emptyset \end{cases}$$
 
$$\mathsf{subject} \text{ to} \sum\_{e \in E} \mathcal{F}(e) \le k$$

where the choices from the neighbors of node *u* are defined as:

$$\mathsf{choice}\_{\mathcal{S}}(u) = \{ (e, a) \mid e = \langle u, v \rangle, \ a = \mathsf{trans}(e, \mathcal{L}(v)), \ a \neq \infty, \ \mathcal{F}(e) = 0 \}$$

The constraints require that every node has selected the best attribute (according to its preference relation) amongst those available from its neighbors. The destination's label must always be the initial attribute *ad*. For verification, this attribute (or parts of it) may be symbolic, which helps model potentially unknown routing announcements from peers outside our network. For other nodes *u*, the selected attribute *a* is the minimal attribute from the *choices* available to *u*. Intuitively, to find the choices available to *u*, we consider the attributes *b* chosen by neighbors *v* of *u*. Then, if the edge between *v* and *u* is not failed, we push *b* along that edge, modifying it according to the trans function. Finally, failure scenarios are constrained so that the sum of the failures is at most *k*.

#### **4 Network Approximation Theory**

Given a concrete SPPF and an abstract SPPF -, a network abstraction is a pair of functions (*f,h*) that relate the two. The topology abstraction *<sup>f</sup>* : *<sup>V</sup>* <sup>→</sup> *<sup>V</sup>* maps each node in the concrete network to a node in the abstract network, while the attribute abstraction *<sup>h</sup>* : *<sup>A</sup>*<sup>∞</sup> <sup>→</sup> *<sup>A</sup>*-<sup>∞</sup> maps a concrete attribute to an abstract attribute. The latter allows us to relate networks running protocols where nodes may appear in the attributes (*e.g.* as in the Path component of BGP). The goal of Origami is to compute compact SPPFs -

 that may be used for verification. These compact SPPFs must be closely related to their concrete counterparts. Otherwise, properties verified on the compact SPPF will not be true of their concrete counterpart. Section 4.1 defines *label approximation*, which provides an intuitive, high-level, semantic relationship between abstract and concrete networks. We also explain some of the consequences of this definition and its limitations. Unfortunately, while this broad definition serves as an important theoretical objective, it is difficult to use directly in an efficient algorithm. Section 4.2 continues our development by explaining two *well-formedness* requirements of network policies that play a key role in establishing label approximation *indirectly*. Finally, Sect. 4.3 defines *effective SPPF approximation* for well-formed SPPFs. This definition is more conservative than label approximation, but has the advantage that it is easier to work with algorithmically and, moreover, it implies label approximation.

#### **4.1 Label Approximation**

Intuitively, we say the abstract SPPF label-approximates the concrete SPPF when SPPF has at least as good a route at every node as SPPF does.

**Definition 1 (Label Approximation).** *Consider any solutions* <sup>S</sup> *to SPPF and* S *to* SPPF *and their respective labelling components* <sup>L</sup> *and* <sup>L</sup>-*. We say* SPPF - *label-approximates SPPF when* <sup>∀</sup>*<sup>u</sup>* <sup>∈</sup> *V. h*(L(*u*)) <sup>L</sup>-(*f*(*u*))*.*

If we can establish a label approximation relation between a concrete and an abstract network, we can typically verify a number of properties of the abstract network and be sure they hold of the concrete network. However, the details of exactly which properties we can verify depend on the specifics of the preference relation (≺). For example, in an OSPF network, preference is determined by weighted path length. Therefore, if we know an abstract node has a path of weighted length *n*, we know that its concrete counterparts have paths of weighted length of at most *n*. More importantly, since "no route" is the worst route, we know that if a node has any route to the destination in the abstract network, so do its concrete counterparts.

**Limitations.** Some properties are beyond the scope of our tool (independent of the preference relation). For example, our model cannot reason about quantitative properties such as bandwidth, probability of congestion, or latency.

#### **4.2 Well-Formed SPPFs**

Not all SPPFs are well-behaved. For example, some never converge and others do not provide sensible models of any real network. To avoid dealing with such poorly-behaved models, we demand henceforth that all SPPFs are *wellformed*. Well-formedness entails that an SPPF is strictly monotonic and isotonic: ∀*a, e. a* = ∞ ⇒ *a* ≺ trans(*e, a*) *strict monotonicity* ∀*a, b, e. a b* ⇒ trans(*e, a*) trans(*e, b*) *isotonicity*


**Fig. 2.** Concrete network (left) and its corresponding abstraction (right). Nodes *c*1*, c*<sup>2</sup> prefer to route through *b*<sup>1</sup> (resp. *b*2), or *g* over *a*. Node *b*<sup>1</sup> (resp. *b*2) drops routing messages that have traversed *b*<sup>2</sup> (resp. *b*1). Red lines indicate a failed link. Dotted lines show a topologically available but unused link. A purple arrow show a route unusable by traffic from *b*1. (Color figure online)

Monotonicity and isotonicity properties are often cited [7,8] as desirable properties of routing policies because they guarantee network convergence and prevent persistent oscillation. In practice too, prior studies have revealed that almost all real network configurations have these properties [13,19].

In our case, these properties help establish additional invariants that tie the routing behavior of concrete and abstract networks together. To gain some intuition as to why, consider the networks of Fig. 2. The concrete network on the left runs BGP with the routing policy that node *c*<sup>1</sup> (and *c*2) prefers to route through node *g* instead of *a*, and that *b*<sup>1</sup> drops announcements coming from *<sup>b</sup>*2. In this scenario, the similarly configured abstract node *b*<sup>12</sup> can reach the destination—it simply takes a route that happens to be less preferred by ˆ*c*<sup>12</sup> than it would if there had been no failure. However, in the concrete analogue, *b*1, is *unable* to reach the destination because *c*<sup>1</sup> only sends it the route through *b*2, which it cannot use. In this case, the concrete network has more topological paths than the abstract network, but, counterintuitively, due to the network's routing policy, this turns out to be a disadvantage. Hence having more paths does not necessarily make nodes more accessible. As a consequence, in general, abstract networks cannot soundly overapproximate the number of failures in a concrete network—an important property for the soundness of our theory.

The underlying issue here is that the networks of Fig. 2 are not isotonic: suppose L (*c*1) is the route from *c*<sup>1</sup> to the destination through node *a*, we have that L(*c*1) ≺ L (*c*1) but since the transfer function over *b*1*, c*1 drops routes that have traversed node *b*2, we have that trans(*b*1*, c*1*,*L(*c*1)) ≺ trans(*b*1*, c*1*,*L (*c*1)). Notice that L (*c*1) is essentially the route that the abstract network uses *i.e. h*(L (*c*1)) = L-(ˆ*c*12), hence the formula above implies that *<sup>h</sup>*(L(*b*1)) <sup>≺</sup> <sup>L</sup>-(ˆ*b*12) which violates the notion of label approximation. Fortunately, if a network is strictly monotonic and isotonic, such situations never arise. Moreover, we check these properties via an SMT solver using a local and efficient test.

#### **4.3 Effective SPPF Approximation**

We seek abstract networks that label-approximate given concrete networks. Unfortunately, to directly check that a particular abstract network label approximates a concrete network one must effectively compute their solutions. Doing so would defeat the entire purpose of abstraction, which seeks to analyze large concrete networks *without the expense of computing their solutions directly*.

In order to turn approximation into a useful computational tool, we define *effective approximation*, a set of simple conditions on the abstraction functions *f* and *h* that are *local* and can be checked efficiently. When true those conditions imply label approximation. Intuitively effective approximations impose three main restrictions on the abstraction functions:


$$\forall a, b. \; a \prec b \iff h(a) \xrightarrow{\sim} h(b).$$

3. The transfer function and the abstraction functions commute *(transequivalence)*: <sup>∀</sup>*e, a. h*(trans(*e, a*)) = trans (*f*(*e*)*, h*(*a*))

$$\forall e, a. \ h(\mathtt{trans}(e, a)) = \widehat{\mathtt{trans}}(f(e), h(a))$$

We prove that when these conditions hold, we can approximate any solution of the concrete network with a solution of the abstract network. **Theorem 1.** *Given a well-formed SPPF and its effective approximation* SPPF -

*, for any solution* S ∈ *SPPF there exists a solution* S ∈ - SPPF -*, such that their labelling functions are label approximate.*

#### **5 The Verification Procedure**

The first step of verification is to compute a small abstract network that satisfies our SPPF *effective approximation* conditions. We do so by grouping network nodes and edges with equivalent policy and checking the forall-exists topological condition, using an algorithm reminiscent of earlier work [3]. Typically, however, this minimal abstraction will not contain enough paths to prove any faulttolerance property. To identify a finer abstraction for which we can prove a fault-tolerance property we repeatedly:




**Fig. 3.** Eight nodes in (a) are represented using two nodes in the abstract network (b). Pictures (c) and (d) show two possible ways to refine the abstract network (b).

Both the search for plausible candidates and the way we learn a new abstraction to continue the counterexample-guided loop are explained below.

#### **5.1 Searching for Plausible Candidates**

Though we might know an abstraction is not sufficient to verify a given fault tolerance property, there are many possible refinements. Consider, for example, Fig. 3(a) presents a simple concrete network that will tolerate a single link failure, and Fig. 3(b) presents an initial abstraction. The initial abstraction will not tolerate any link failure, so we must refine the network. To do so, we choose an abstract node to divide into two abstract nodes for the next iteration. We must also decide which concrete nodes correspond to each abstract node. For example, in Fig. 3(c), node ˆ*a* has been split into ˆ*a*<sup>13</sup> and ˆ*a*24. The subscripts indicate the assignment of concrete nodes to abstract ones.

A significant complication is that once we have generated a new abstraction, we must check that it continues to satisfy the effective approximation conditions, and if not, we must do more work. Figure 3(c) satisfies those conditions, but if we were to split ˆ*a* into ˆ*a*<sup>12</sup> and ˆ*a*<sup>34</sup> rather than ˆ*a*<sup>13</sup> and ˆ*a*24, the forall-exists condition would be violated—some of the concrete nodes associated with ˆ*b* are connected to the concrete nodes in ˆ*a*<sup>12</sup> but not to the ones in ˆ*a*<sup>34</sup> and vice versa. To repair the violation of the forall-exists condition, we need to split additional nodes. In this case, the ˆ*b* node, giving rise to diagram Fig. 3(d).

Overall, the process of splitting nodes and then recursively splitting further nodes to repair the forall-exists condition generates many possible candidate abstractions to consider. A key question is which candidate should we select to proceed with the abstraction refinement algorithm?

One consideration is size: A smaller abstraction avoids taxing the verifier, which is the ultimate goal. However, there are many small abstractions that we can quickly dismiss. Technically, we say an abstraction is *plausible* if all nodes of interest have at least *k* + 1 paths to the destination. Implausible abstractions cause nodes to become unreachable with *k* failures. To check whether an abstraction is plausible, we compute the *min-cut* of the graph. Figure 3(d) is an example of an implausible abstraction that arose after a poorly-chosen split of node ˆ*a*. In this case, no node has 2 or more paths to the destination and hence they might not be able to reach the destination when there is a failure.

Clearly verification using an implausible abstraction will fail. Instead of considering such abstractions as candidates for running verification on, the refinement algorithm tries refining them further. A key decision the algorithm needs to make when refining an abstraction is *which abstract node to split*. For instance, the optimal refinement of Fig. 3(b) is Fig. 3(c). If we were to split node ˆ*b* instead of ˆ*a* we would end up with a sub-optimal (in terms of size) abstraction. Intuitively, splitting a node that lies on the min-cut and can reach the destination (e.g. ˆ*a*) will increase the number of paths that its neighbors on the unreachable part of the min-cut (e.g. ˆ*b*) can use to reach the destination.

To summarize, the search for new candidate abstractions involves (1) splitting nodes in the initial abstraction, (2) repairing the abstraction to ensure the forallexists condition holds, (3) checking that the generated abstraction is *plausible*, and if not, (4) splitting additional nodes on the min cut. This iterative process will often generate many candidates. The *breadth* parameter of the search bounds the total number of plausible candidates we will generate in between verification efforts. Of all the plausible candidates generated, we choose the smallest one to verify using the SMT solver.

#### **5.2 Learning from Counterexamples**

Any nodes of an abstraction that have a min cut of less than *k*+ 1 definitely cannot tolerate *k* faults. If an abstraction is plausible, it satisfies a *necessary* condition for source-destination connectivity, but not a *sufficient* one—misconfigured routing policy can still cause nodes to be unreachable by modifying and/or subsequently dropping routing messages. For instance, the abstract network of Fig. 3c is plausible for one failure, but if *<sup>b</sup>*'s routing policy blocks routes of either *a*<sup>13</sup> or *a*<sup>24</sup> then the abstract network will not be 1-fault tolerant. Indeed, it is the complexity of routing policy that necessitates a heavy-weight verification procedure in the first place, rather than a simpler graph algorithm alone.

In a plausible abstraction, if the verifier computes a solution to the network that violates the desired fault-tolerance property, some node could not reach the destination because one or more of their paths to the destination could not be used to route traffic. We use the generated counterexample to learn edges that could not be used to route traffic due to the policy on them. To do so, we inspect the computed solution to find nodes *<sup>u</sup>* that (1) lack a route to the destination (*i.e.* L-(*u*-) = <sup>∞</sup>), (2) have a neighbor *<sup>v</sup>* that has a valid route to the destination, and (3) the link between *<sup>u</sup>* and *v* is not failed. These conditions imply the absence of a valid route to the destination not because link failures disabled all paths to the destination, but because the network policy dropped some routes. For example, in picture Fig. 3c, consider the case where *<sup>b</sup>* does not advertise routes from *a*<sup>13</sup> and *<sup>a</sup>*24; if the link between *a*<sup>13</sup> and *d* fails, then *a*<sup>13</sup> has no route the destination and we learn that the edge -*b, <sup>a</sup>*13 cannot be used. In fact, since *a*13 and *a*<sup>12</sup> belonged to the same abstract group *a* before we split them, their routing policies are equal modulo the abstraction function by trans-equivalence. Hence, we can infer that in a symmetric scenario, the link -*b, a*24 will also be unusable.

Given a set of unuseable edges, learned from a counterexample, we restrict the min cut problems that define the plausible abstractions, by disallowing the use of those edges. Essentially, we enrich the refinement algorithm's topological based analysis (based on min-cut) with knowledge about the policy; the algorithm will have to generate abstractions that are plausible without using those edges. With those edges disabled, the refinement process continues as before.

#### **6 Implementation**

Origami uses the Batfish network analysis framework [12] to parse network configurations, and then translate them into a pure functional intermediate representation (IR) designed for network verification. This IR represents the structure of routing messages and the semantics of transfer and preference relations using standard functional data structures.

The translation generates a separate functional program for each destination subnet. In other words, if a network has 100 top-of-rack switches and each such switch announces the subnets for 30 adjacent hosts, then Origami generates 100 functional programs (*i.e.* problem instances). We separately apply our algorithms to each problem instance, converting the functional program to an SMT formula when necessary according to the algorithm described earlier. Since vendor routing configuration languages have limited expressive power (*e.g.*, no loops or recursion) the translation requires no user-provided invariants. We use Z3 [10] to determine satisfiability of the SMT problems. Solving the problems separately (and in parallel) provides a speedup over solving the routing problem for all destinations simultaneously: The individual problems are specialized to a particular destination. By doing so, opportunities for optimizations that reduce the problem size, such as dead code elimination, arise.

**Optimizing Refinement:** During the course of implementing Origami, we discovered a number of optimizations to the refinement phase.


**Minimizing Counterexamples:** When the SMT solver returns a counterexample, it often uses the maximum number of failures. This is not surprising as maximizing failures simplifies the SMT problem. Unfortunately, it also confounds our analysis to determine whether a counterexample is real or spurious.


**Fig. 4.** Compression results. **Topo:** the network topology. **Con V/E:** Number of nodes/edges of concrete network. **Fail:** Number of failures. **Abs V/E:** Number of nodes/edges of the best abstraction. **Ratio:** Compression ratio (nodes/edges). **Abs Time:** Time taken to find abstractions (sec.). **SMT Calls:** Number of calls to the SMT solver. **SMT Time:** Time taken by the SMT solver (sec.).

To mitigate the effect of this problem, we *could* ask the solver to minimize the returned counterexample, returning a counterexample that corresponds to the fewest concrete link failures. We could do so by providing the solver with additional constraints specifying the number of concrete links that correspond to each abstract link and then asking the solver to return a counterexample that minimizes this sum of concrete failures. Of course, doing so requires we solve a more expensive optimization problem. Instead, given an initial (possibly spurious counter-example), we simple ask the solver to find a new counterexample that (additionally) satisfies this constraint. If it succeeds, we have found a real counterexample. If it fails, we use it to refine our abstraction.

#### **7 Evaluation**

We evaluate Origami on a collection of synthetic data center networks that are using BGP to implement shortest-paths routing policies over common industrial datacenter topologies. Data centers are good fit for our algorithms as they can be very large but are highly symmetrical and designed for fault tolerance. Data center topologies (often called *fattree* topologies) are typically organized in layers, with each layer containing many routers. Each router in a layer is connected to a number of routers in the layer above (and below) it. The precise number of neighbors to which a router is connected, and the pattern of said connections, is part of the topology definition. We focus on two common topologies: fattree topologies used at Google (labelled FT20, FT40 and SP40 below) and a different fattree used at Facebook (labelled FB12). These are relatively large data center topologies ranging from 500 to 2000 nodes and 8000 to 64000 edges.

SP40 uses a pure shortest paths routing policy. For other experiments (FT20, FT40, FB12), we augment shortest paths with additional policy that selectively drops routing announcements, for example disabling "valley routing" in various places which allows up-down-up-down routes through the data centers instead of just up-down routes. The pure shortest paths policy represents a best-case scenario for our technology as it gives rise to perfect symmetry and makes our heuristics especially effective. By adding variations in routing policy, we provide a greater challenge for our tool.

Experiments were done on a Mac with a 4 GHz i7 CPU and 16 GB memory.

#### **7.1 Compression Results**

Figure 4 shows the level of compression achieved, along with the required time for compression and verification. In most cases, we achieve a high compression ratio especially in terms of links. This drastically reduces the possible failure combinations for the underlying verification process. The cases of 10 link failures on FT20 and 5 link failures on FbFT demonstrate another aspect of our algorithm. Both topologies cannot sustain that many link failures, *i.e.* some concrete nodes have less than 10 (resp. 5) neighbors. We can determine this as we refine the abstraction; there are (abstract) nodes that do not satisfy the min cut requirement and we cannot refine them further. This constitutes an actual counterexample and explains why the abstraction of FT20 for 10 link failures is smaller than the one for 5 link failures. Importantly, we did not use the SMT solver to find this counterexample. Likewise, we did not need to run a min cut on the much larger concrete topology. Intuitively, the rest of the network remained abstract, while the part that led to the counterexample became fully concrete.

#### **7.2 Verification Performance**

The verification time of Origami is dominated by abstraction time and SMT time, which can be seen in Fig. 4. In practice, there is also some time taken to parse and pre-process the configurations but it is negligible. The abstraction time is highly dependent on the size of the network and the abstraction search breadth used. In this case, the breadth was set to 25, a relatively high value.

While the verification time for a high number of link failures is not negligible, we found that verification without abstraction is essentially impossible. We used Minesweeper [2], the state-of-the-art SMT-based network verifier, to verify the same fault tolerance properties and it was unable to solve any of our queries. This is not surprising, as SMT-based verifiers do not scale to networks beyond the size of FT20 even without any link failures.

#### **7.3 Refinement Effectiveness**

We now evaluate the effectiveness of our search and refinement techniques.

**Effectiveness of Search.** To assess the effectiveness of the search procedure, we compute an initial abstraction of the FT20 network suitable for 5 link failures, using different values of the search breadth. On top of this, we additionally consider the impact of some of the heuristics described in Sect. 5. Figure 5 presents the size (the number of nodes are on the y axis and the number of edges on top of the bars) of the computed abstractions with respect to various values for the breadth of search and sets of heuristics:

– Heuristics off means that (almost) all heuristics are turned off. We still try to split nodes that are on the cut-set.

**Fig. 5.** The initial abstraction of FT20 for 5 link failures using different heuristics and search breadth. On top of the bars is the number of edges of each abstraction.


The results of this experiment show that in order to achieve effective compression ratios we need to employ both smart heuristics and a wide search through the space of abstractions. It is possible that increasing the search breadth would make the heuristics redundant, however, in most cases this would make the refinement process exceed acceptable time limits.

**Use of Counterexamples.** We now assess how important it is to (1) use symmetries in policy to infer more information from counterexamples, and (2) minimize the counterexample provided by the solver.

We see in Fig. 6 that disabling them increases number of refinement iterations. While each of these refinements is performed quickly, the same cannot be guaranteed of the verification process that runs between them. Hence, it is important to keep refinement iterations as low as possible.

#### **8 Related Work**

Our approach to network fault-tolerance verification draws heavily from ideas in prior work exploiting symmetry and abstraction in model checking [4,6,17] and automatic abstraction refinement via CEGAR [1,5,9]. However, we apply these ideas to network routing, which introduces different challenges and opportunities. For example, our notion of abstraction (∀∃−abstraction) differs from the typical existential abstraction used in model checking [6]. In addition, we have to deal with network topological structure and asymmetries introduced by failures.

Bonsai [3] and Surgeries [22] both leverage abstraction to accelerate verification for routing protocols and packet forwarding respectively. Both tools compute a single abstract network that is bisimilar to the original concrete network. Alas, neither approach can be used to reason about properties when faults may occur.

Minesweeper [2] is a general approach to control plane verification based on a stable state encoding, which leverages an SMT solver in the back-end. It supports a wide range of routing protocols and properties, including fault tolerance properties. Our compression is complementary

**Fig. 6.** Effectiveness of minimizing counterexamples and of learning unused edges. On top of the bars is the number of SMT calls.

to such tools; it is used to alleviate the scaling problem that Minesweeper faces with large networks.

With respect to verification of fault tolerance, ARC [13] translates a limited class of routing policies to a weighted graph where fault-tolerance properties can be checked using graph algorithms. However, ARC only handles shortest path routing and cannot support stateful features such as BGP communities, or local preference, etc. While ARC applies graph algorithms on a statically-computed graph, we use graph algorithms as part of a refinement loop in conjunction with a general purpose solver.

#### **9 Conclusions**

We present a new theory of distributed routing protocols in the presence of bounded link failures, and we use the theory to develop algorithms for network compression and counterexample-guided verification of fault tolerance properties. In doing so, we observe that (1) even though abstract networks route differently from concrete ones in the presence of failures, the concrete routes wind up being "at least as good" as the abstract ones when networks satisfy reasonable well-formedness constraints, and (2) using efficient graph algorithms (min cut) in the middle of the CEGAR loop speeds the search for refinements.

We implemented our algorithms in a network verification tool called Origami. Evaluation of the tool on synthetic networks shows that our algorithms accelerate verification of fault tolerance properties significantly, making it possible to verify networks out of reach of other state-of-the-art tools.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **On the Complexity of Checking Consistency for Replicated Data Types**

Ranadeep Biswas1(B) , Michael Emmi<sup>2</sup>, and Constantin Enea<sup>1</sup>

<sup>1</sup> Universit´e de Paris, IRIF, CNRS, 75013 Paris, France {ranadeep,cenea}@irif.fr <sup>2</sup> SRI International, New York, NY, USA michael.emmi@sri.com

**Abstract.** Recent distributed systems have introduced variations of familiar abstract data types (ADTs) like counters, registers, flags, and sets, that provide high availability and partition tolerance. These *conflict-free replicated data types* (CRDTs) utilize mechanisms to resolve the effects of concurrent updates to replicated data. Naturally these objects weaken their consistency guarantees to achieve availability and partition-tolerance, and various notions of *weak consistency* capture those guarantees.

In this work we study the tractability of CRDT-consistency checking. To capture guarantees precisely, and facilitate symbolic reasoning, we propose novel logical characterizations. By developing novel reductions from propositional satisfiability problems, and novel consistencychecking algorithms, we discover both positive and negative results. In particular, we show intractability for replicated flags, sets, counters, and registers, yet tractability for replicated growable arrays. Furthermore, we demonstrate that tractability can be redeemed for registers when each value is written at most once, for counters when the number of replicas is fixed, and for sets and flags when the number of replicas and variables is fixed.

#### **1 Introduction**

Recent distributed systems have introduced variations of familiar abstract data types (ADTs) like counters, registers, flags, and sets, that provide high availability and partition tolerance. These *conflict-free replicated data types* (CRDTs) [33] efficiently resolve the effects of concurrent updates to replicated data. Naturally they weaken consistency guarantees to achieve availability and partition-tolerance, and various notions of *weak consistency* capture such guarantees [8,11,29,35,36].

In this work we study the tractability of CRDT consistency checking; Fig. 1 summarizes our results. In particular, we consider *runtime verification*: deciding

This work is supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 678177).


**Fig. 1.** The complexity of consistency checking for various replicated data types. We demonstrate intractability and tractability results in Sects. 3 and 4, respectively.

whether a given execution of a CRDT is consistent with its ADT specification. This problem is particularly relevant as distributed-system testing tools like Jepsen [25] are appearing; without efficient, general consistency-checking algorithms, such tools could be limited to specialized classes of errors like node crashes.

Our setting captures executions across a set of replicas as per-replica sequences of operations called *histories*. Roughly speaking, a history is *consistent* so long as each operation's return value can be justified according to the operations that its replica has observed so far. In the setting of CRDTs, the determination of a replica's observations is essentially an implementation choice: replicas are only obliged to observe their own operations, and the predecessors of those it has already observed. This relatively-weak constraint on replicas' observations makes the CRDT consistency checking problem unique.

Our study proceeds in three parts. First, to precisely characterize the consistency of various CRDTs, and facilitate symbolic reasoning, we develop novel logical characterizations to capture their guarantees. Our logical models are built on a notion of *abstract execution*, which relates the operations of a given history with three separate relations: a *read-from* relation, governing the observations from which a given operation constitutes its own return value; a *happens-before* relation, capturing the causal relationships among operations; and a *linearization* relation, capturing any necessary arbitration among non-commutative effects which are executed concurrently, e.g., following a *last-writer-wins* policy. Accordingly, we capture data type specifications with logical axioms interpreted over the read-from, happens-before, and linearization relations of abstract executions, reducing the consistency problem to: does there exist an abstract execution over the given history which satisfies the axioms of the given data type?

Second, we demonstrate the intractability of several replicated data types by reduction from propositional satisfiability (SAT) problems. In particular, we consider the 1-in-3 SAT problem [19], which asks for a truth assignment to the variables of a given set of clauses such that exactly one literal per clause is assigned true. Our reductions essentially simulate the existential choice of a truth assignment with the existential choice of the read-from and happens-before relations of an abstract execution. For a given 1-in-3 SAT instance, we construct a history of replicas obeying carefully-tailored synchronization protocols, which is consistent exactly when the corresponding SAT instance is positive.

Third, we develop tractable consistency-checking algorithms for individual data types and special cases: replicated growing arrays; multi-value and lastwriter-wins registers, when each value is written only once; counters, when replicas are bounded; and sets and flags, when their sizes are also bounded. While the algorithms for each case are tailored to the algebraic properties of the data types they handle, they essentially all function by constructing abstract executions incrementally, processing replicas' operations in prefix order.

The remainder of this article is organized around our three key contributions:


Section 7 overviews related work, and Sect. 8 concludes.

#### **2 A Logical Characterization of Replicated Data Types**

In this section we describe an axiomatic framework for defining the semantics of replicated data types. We consider a set of method names M, and that each method <sup>m</sup> <sup>∈</sup> <sup>M</sup> has a number of arguments and a return value sampled from a data domain D. We will use operation labels of the form m(a) *<sup>i</sup>* <sup>⇒</sup> <sup>b</sup> to represent the call of a method <sup>m</sup> <sup>∈</sup> <sup>M</sup>, with argument <sup>a</sup> <sup>∈</sup> <sup>D</sup>, and resulting in the value <sup>b</sup> <sup>∈</sup> <sup>D</sup>. Since there might be multiple calls to the same method with the same arguments and result, labels are tagged with a unique identifier i. We will ignore identifiers when unambiguous.

The interaction between a data type implementation and a client is represented by a *history* <sup>h</sup> <sup>=</sup> Op,ro which consists of a set of operation labels Op and a partial *replica order* ro ordering operations issued by the client on the same replica. Usually, ro is a union of sequences, each sequence representing the operations issued on the same replica, and the *width* of ro, i.e., the maximum number of mutually-unordered operations, gives the number of replicas in a given history.

To characterize the set of histories <sup>h</sup> <sup>=</sup> Op,ro admitted by a certain replicated data type, we use *abstract executions* <sup>e</sup> <sup>=</sup> rf, hb, lin, which include:


In this work, we consider replicated data types which satisfy *causal consistency* [26], i.e., updates which are related by cause and effect relations are observed by all replicas in the same order. This follows from the fact that the happens-before order is constrained to be a partial order, and thus transitive (other forms of weak consistency don't pose this constraint). Some of the replicated data types we consider in this paper do *not* consider resolution policies based on timestamps and in those cases, the linearization order can be ignored.


**Fig. 2.** The axiomatic semantics of replicated data types. Quantified variables are implicitly distinct, and ∃!o denotes the existence of a unique operation o.

A *replicated data type* is defined by a set of first-order axioms Φ characterizing the relations in an abstract execution. A history h is *admitted* by a data type when there exists an abstract execution <sup>e</sup> such that h, e |<sup>=</sup> <sup>Φ</sup>. The satisfaction relation |= is defined as usual in first order logic. The *admissibility problem* is the problem of checking whether a history h is admitted by a given data type.

In the following, we define the replicated data types with respect to which we study the complexity of the admissibility problem. The axioms used to define them are listed in Figs. 2 and 3. These axioms use the function symbols meth-od, arg-ument, and ret-urn interpreted over operation labels, whose semantics is self-explanatory.

#### **2.1 Replicated Sets and Flags**

The Add-Wins Set and Remove-Wins Set [34] are two implementations of a replicated set with operations add(x), remove(x), and contains(x) for adding, removing, and checking membership of an element x. Although the meaning of these methods is self-evident from their names, the result of conflicting concurrent operations is not evident. When concurrent add(x) and remove(x) operations are delivered to a certain replica, the Add-Wins Set chooses to keep the element x in the set, so every subsequent invocation of contains(x) on this replica returns *true*, while the Remove-Wins Set makes the dual choice of removing x from the set.

The formal definition of their semantics uses abstract executions where the read-from relation associates sets of add(x) and remove(x) operations to contains(x) operations. Therefore, the predicate ReadOk(o1, o2) is defined by

meth(o1) ∈ {add,remove} ∧ meth(o2) = contains <sup>∧</sup> arg(o1) = arg(o2)

and the Add-Wins Set is defined by the following set of axioms:

ReadFrom(ReadOk) <sup>∧</sup> ReadFromMaximal(ReadOk) <sup>∧</sup> ReadAllMaximals(ReadOk) <sup>∧</sup> RetvalSet(contains,*true*, add)

ReadFromMaximal says that every operation read by a contains(x) is maximal among its hb-predecessors that add or remove <sup>x</sup> while ReadAllMaximals says that all such maximal hb-predecessors are read. The RetvalSet instantiation ensures that a contains(x) returns *true* iff it reads-from at least one add(x).

The definition of the Remove-Wins Set is similar, except for the parameters of RetvalSet, which become RetvalSet(contains, *false*,remove), i.e., a contains(x) returns *false* iff it reads-from at least one remove(x).

The Enable-Wins Flag and Disable-Wins Flag are implementations of a set of flags with operations: enable(x), disable(x), and read(x), where enable(x) turns the flag x to true, disable(x) turns x to false, while read(x) returns the state of the flag x. Their semantics is similar to the Add-Wins Set and Remove-Wins Set, respectively, where enable(x), disable(x), and read(x) play the role of add(x), remove(x), and contains(x), respectively. Their axioms are defined as above.

#### **2.2 Replicated Registers**

We consider two variations of replicated registers called Multi-Value Register (MVR) and Last-Writer-Wins Register (LWW) [34] which maintain a set of registers and provide write(x,v) operations for writing a value v on a register x and read(x) operations for reading the content of a register x (the domain of values is kept unspecified since it is irrelevant). While a read(x) operation of MVR returns *all* the values written by concurrent writes which are maximal among its happens-before predecessors, therefore, leaving the responsibility for solving conflicts between concurrent writes to the client, a read(x) operation of LWW returns a single value chosen using a conflict-resolution policy based on timestamps. Each written value is associated to a timestamp, and a read operation returns the most recent value w.r.t. the timestamps. This order between timestamps is modeled using the linearization order of an abstract execution.

Therefore, the predicate ReadOk(o1, o2) is defined by

$$\mathsf{match}(o\_1) = \mathsf{write} \land \mathsf{match}(o\_2) = \mathsf{read} \land \mathsf{arg}\_1(o\_1) = \mathsf{arg}(o\_2) \land \mathsf{arg}\_2(o\_1) \in \mathsf{ret}(o\_2)$$

(we use arg1(o1) to denote the first argument of a write operation, i.e., the register name, and arg2(o1) to denote its second argument, i.e., the written value) and the MVR is defined by the following set of axioms:

> ReadFrom(ReadOk) <sup>∧</sup> ReadFromMaximal(ReadOk) <sup>∧</sup> ReadAllMaximals(ReadOk) <sup>∧</sup> RetvalReg

where RetvalReg ensures that a read(x) operation reads from a write(x,v) operation, for each value v in the set of returned values<sup>1</sup>.

LWW is obtained from the definition of MVR by replacing ReadAllMaximals with the axiom LinLWW which ensures that every write(x, ) operation which happens-before a read(x) operation is linearized before the write(x, ) operation from where the read(x) takes its value (when these two write operations are different). This definition of LWW is inspired by the "bad-pattern" characterization in [6], corresponding to their causal convergence criterion.

#### **2.3 Replicated Counters**

The replicated counter datatype [34] maintains a set of counters interpreted as integers (the counters can become negative). This datatype provides operations inc(x) and dec(x) for incrementing and decrementing a counter x, and read(x) operations to read the value of the counter x. The semantics of the replicated counter is quite standard: a read(x) operation returns the value computed as the difference between the number of inc(x) operations and dec(x) operations among its happens-before predecessors. The axioms defined below will enforce the fact that a read(x) operation reads-from all its happens-before predecessors which are inc(x) or dec(x) operations.

Therefore, the predicate ReadOk(o1, o2) is defined by

meth(o1) ∈ {inc, dec} ∧ meth(o2) = read <sup>∧</sup> arg(o1) = arg(o2)

and the replicated counter is defined by the following set of axioms:

ReadFrom(ReadOk) <sup>∧</sup> ClosedRF(ReadOk) <sup>∧</sup> RetvalCounter.

<sup>1</sup> For simplicity, we assume that every history contains a set of write operations writing the initial values of variables, which precede every other operation in replica order.

**Fig. 3.** Axioms used to define the semantics of RGA.

#### **2.4 Replicated Growable Array**

The Replicated Growing Array (RGA) [32] is a replicated list used for textediting applications. RGA supports three operations: addAfter(a,b) which adds the character b immediately after the occurrence of the character a assumed to be present in the list, remove(a) which removes a assumed to be present in the list, and read() which returns the list contents. It is assumed that a character is added at most once<sup>2</sup>. The conflicts between concurrent addAfter operations that add a character immediately after the same character is solved using timestamps (i.e., each added character is associated to a timestamp and the order between characters depends on the order between the corresponding timestamps), which in the axioms below are modeled by the linearization order.

Figure <sup>3</sup> lists the axioms defining RGA. ReadFromRGA ensures that:


Then, RetvalRGA ensures that a read operation <sup>o</sup><sup>1</sup> happening-after an operation adding a character a reads-from a remove(a) operation when a doesn't occur in the list returned by o<sup>1</sup> (the history must contain a remove(a) operation because otherwise, a should have occurred in the list returned by the read).

Finally, LinRGA models the conflict resolution policy by constraining the linearization order between addAfter(a, ) operations adding some character

<sup>2</sup> In a practical context, this can be enforced by tagging characters with replica identifiers and sequence numbers.

<sup>3</sup> This element is not returned by read operations.

immediately after the same character <sup>a</sup>. As a particular case, LinRGA enforces that addAfter(a,b) is linearized before addAfter(a,c) when a read operation returns a list where <sup>c</sup> precedes <sup>b</sup> (addAfter(a,b) results in the list <sup>a</sup> · <sup>b</sup> and applying addAfter(a,c) on <sup>a</sup> · <sup>b</sup> results in the list <sup>a</sup> · <sup>c</sup> · <sup>b</sup>). However, this is not sufficient: assume that the history contains the two operations addAfter(a,b) and addAfter(a,c) along with two operations remove(b) and addAfter(b,d). Then, a read operation returning the list <sup>a</sup> · <sup>c</sup> · <sup>d</sup> must enforce that addAfter(a,b) is linearized before addAfter(a,c) because this is the only order between these two operations that can lead to the result <sup>a</sup> · <sup>c</sup> · <sup>d</sup>, i.e., executing addAfter(a,b), addAfter(b,d), remove(b), addAfter(a,c) in this order. LinRGA deals with any scenario where arbitrarily-many characters can be removed from the list: rf<sup>∗</sup> addAfter is the reflexive and transitive closure of the projection of rf on addAfter operations and <*<sup>o</sup>*<sup>5</sup> denotes the order between characters in the list returned by the read operation o5.

#### **3 Intractability for Registers, Sets, Flags, and Counters**

In this section we demonstrate that checking the consistency is intractable for many widely-used data types. While this is not completely unexpected, since some related consistency-checking problems like sequential consistency are also intractable [20], this contrasts recent tractability results for checking strong consistency (i.e., linearizability) of common non-replicated data types like sets, maps, and queues [15]. In fact, in many cases we show that intractability even holds if the number of replicas is fixed.

Our proofs of intractability follow the general structure of Gibbons and Korach's proofs for the intractability of checking sequential consistency (SC) for atomic registers with read and write operations [20]. In particular, we reduce a specialized type of NP-hard propositional satisfiability (SAT) problem to checking whether histories are admitted by a given data type. While our construction borrows from Gibbons and Korach's, the adaptation from SC to CRDT consistency requires a significant extension to handle the consistency relaxation represented by abstract executions: rather than a direct sequencing of threads' operations, CRDT consistency requires the construction of three separate relations: read-from, happens-before, and linearization.

Technically, our reductions start from the 1-in-3 SAT problem [19]: given a propositional formula *m <sup>i</sup>*=1(α*<sup>i</sup>* <sup>∨</sup> <sup>β</sup>*<sup>i</sup>* <sup>∨</sup> <sup>γ</sup>*i*) over variables <sup>x</sup>1,...,x*<sup>n</sup>* with only positive literals, i.e., <sup>α</sup>*i*, β*i*, γ*<sup>i</sup>* ∈ {x1,...,x*n*}, does there exist an assignment to the variables such that exactly one of α*i*, β*i*, γ*<sup>i</sup>* per clause is assigned *true*? The proofs of Theorems 1 and 2 reduce 1-in-3 SAT to CRDT consistency checking.

**Theorem 1.** *The admissibility problem is NP-hard when the number of replicas is fixed for the following data types: Add-Wins Set, Remove-Wins Set, Enable-Wins Flag, Disable-Wins Flag, Multi-Value Register, and Last-Writer-Wins Register.*


**Fig. 4.** The encoding of a 1-in-3 SAT problem *m <sup>i</sup>*=1(α*i*∨β*i*∨γ*i*) over variables x1,...,x*<sup>n</sup>* as a 3-replica history of a flag data type. Besides the flag variable x*<sup>j</sup>* for each propositional variable x*<sup>j</sup>* , the encoding adds per-replica variables y*<sup>j</sup>* for synchronization barriers.

*Proof.* We demonstrate a reduction from the 1-in-3 SAT problem. For a given problem p = *m <sup>i</sup>*=1(α*i*∨β*i*∨γ*i*) over variables <sup>x</sup>1,...,x*n*, we construct a 3-replica history h*<sup>p</sup>* of the flag data type — either enable- or disable-wins — as illustrated in Fig. 4. The encoding includes a flag variable x*<sup>j</sup>* for each propositional variable x*<sup>j</sup>* , along with a per-replica flag variable y*<sup>j</sup>* used to implement synchronization barriers. Intuitively, executions of h*<sup>p</sup>* proceed in m + 1 rounds: the first round corresponds to the assignment of a truth valuation, while subsequent rounds check the validity of each clause given the assignment. The reductions to sets and registers are slight variations on this proof, in which the Read, Enable, and Disable operations are replaced with Contains, Add, and Remove, respectively, and Read and Writes of values 1 and 0, respectively.

It suffices to show that the constructed history h*<sup>p</sup>* is admitted if and only if the given problem p is satisfiable. Since the flag data type does not constrain the linearization relation of its abstract executions, we regard only the readfrom and happens-before components. It is straightforward to verify that the happens-before relations of h*p*'s abstract executions necessarily order:


In other words, replicas appear to execute atomically per round, in a roundrobin fashion. Furthermore, since all operations in a given round happen before the operations of subsequent rounds, the values of flag variables are consistent across rounds —i.e., as read by the first replica to execute in a given round and determined in the initial round either by conflict resolution — i.e., enableor disable-wins — or by happens-before, in case conflict resolution would have been inconsistent with subsequent reads.

In the "if" direction, let *<sup>r</sup>* ∈ {0, <sup>1</sup>, <sup>2</sup>} *<sup>m</sup>* be the positions of positively-assigned variables in each clause, e.g., r*<sup>i</sup>* = 0 implies α*<sup>i</sup>* = *true* and β*<sup>i</sup>* = γ*<sup>i</sup>* = *false*. We construct an abstract execution e*<sup>r</sup>* in which the happens-before relation sequences the operations of replica r*<sup>i</sup>* before those of r*<sup>i</sup>* + 1 mod 3, and in turn before r*<sup>i</sup>* + 2 mod 3. In other words, the replicas in round i appear to execute in left-to-right order from starting with the replica r*i*, whose reads correspond to the satisfying assignment of (α*i*∨β*i*∨γ*i*). The read-from relation of <sup>e</sup>*<sup>r</sup>* relates each Read(x*<sup>j</sup>* ) = *true* operation to the most recent Enable(x*<sup>j</sup>* ) operation in happensbefore order, which is unique since happens-before sequences the operations of all rounds; the case for Read(x*<sup>j</sup>* ) = *false* and Disable(x*<sup>j</sup>* ) is symmetric. It is then straightforward to verify that e*<sup>r</sup>* satisfies the axioms of the enable- or disablewins flag, and thus h*<sup>p</sup>* is admitted.

In the "only if" direction, let <sup>e</sup> be an abstract execution of <sup>h</sup>*p*, and let *<sup>r</sup>* <sup>∈</sup> {0, <sup>1</sup>, <sup>2</sup>} *<sup>m</sup>* be the replicas first to execute in each round according to the happensbefore order of e. It is straightforward to verify that the assignment in which a given variable is set to true iff the replica encoding its positive assignment in some clause executes first in its round, i.e.,

$$x\_j = \begin{cases} true & \text{if } \exists i. (r\_i = 0 \land \alpha\_i = x\_j) \lor (r\_i = 1 \land \beta\_i = x\_j) \lor (r\_i = 2 \land \gamma\_i = x\_j) \\ false & \text{otherwise}, \end{cases}$$

is a satisfying assignment to <sup>p</sup>.

Theorem 1 establishes intractability of consistency for the aforementioned sets, flags, and registers, independently from the number of replicas. In contrast, our proof of Theorem 2 for counter data types depends on the number of replicas, since our encoding requires two replicas per propositional variable. Intuitively, since counter increments and decrements are commutative, the initial round in the previous encoding would have fixed all counter values to zero. Instead, the next encoding isolates initial increments and decrements to independent replicas. The weaker result is indeed tight since checking counter consistency with a fixed number of replicas is polynomial time, as Sect. 5 demonstrates.

#### **Theorem 2.** *The admissibility problem for the Counter data type is NP-hard.*

*Proof.* We demonstrate a reduction from the 1-in-3 SAT problem. For a given problem p = *m <sup>i</sup>*=1(α*<sup>i</sup>* <sup>∨</sup> <sup>β</sup>*<sup>i</sup>* <sup>∨</sup> <sup>γ</sup>*i*) over variables <sup>x</sup>1,...,x*n*, we construct a history h*<sup>p</sup>* of the counter data type over 2n + 3 replicas, as illustrated in Fig. 5.

Besides the differences imposed due to the commutativity of counter increments and decrements, our reduction follows the same strategy as in the proof of Theorem 1: the happens-before relation of h*p*'s abstract executions order every pair of operations in distinct rounds (of Replicas 0–2), and every operation in a given (non-initial) round. As before, Replicas 0–2 appear to execute atomically per round, in a round-robin fashion, and counter variables are consistent across rounds. The key difference is that here abstract executions' happensbefore relations only relate the operations of either Replica 2j+1 or 2j+2, for each j = 1,...,n, to operations in subsequent rounds: the other's operations are never observed by other replicas. Our encoding ensures that exactly one of each is observed by ensuring that the counter y is incremented exactly n times — and relying on the fact that every variable appears in some clause, so that a read that observed neither or both would yield the value zero, which is inconsistent with h*p*. Otherwise, our reasoning follows the proof of Theorem 1, in which the read-from relation selects all increments and decrements of the same counter variable in happens-before order. 

#### **4 Polynomial-Time Algorithms for Registers and Arrays**

We show that the problem of checking consistency is polynomial time for RGA, and even for LWW and MVR under the assumption that each value is written at most once, i.e., for each value v, the input history contains at most one write operation write(x,v). Histories satisfying this assumption are called *differentiated*. The latter is a restriction motivated by the fact that practical implementations of these datatypes are data-independent [38], i.e., their behavior doesn't depend on the concrete values read or written and any potential buggy behavior can be exposed in executions where each value is written at most once. Also, in a testing environment, this restriction can be enforced by tagging each value with a replica identifier and a sequence number.

In all three cases, the feature that enables polynomial time consistency checking is the fact that the read-from relation becomes fixed for a given history, i.e., if the history is consistent, then there exists exactly one read-from relation rf that satisfies the ReadFrom and Retval axioms, and rf can be derived syntactically from the operation labels (using those axioms). Then, our axiomatic characterizations enable a consistency checking algorithm which roughly, consists in instantiating those axioms in order to compute an abstract execution.

The consistency checking algorithm for RGA, LWW, and MVR is listed in Algorithm 1. It computes the three relations rf, hb, and lin of an abstract execution using the datatype's axioms. The history is declared consistent iff there exist satisfying rf and hb relations, and the relations hb and lin computed this way are acyclic. The acyclicity requirement comes from the definition of abstract executions where hb and lin are required to be partial/total orders. While an abstract execution would require that lin is a total order, this algorithm computes a partial linearization order. However, any total order compatible with this partial linearization would satisfy the axioms of the datatype.

ComputeRF computes the read-from relation rf satisfying the ReadFrom and Retval axioms. In the case of LWW and MVR, it defines rf as the set


**Fig. 5.** The encoding of a 1-in-3 SAT problem *m <sup>i</sup>*=1(α*i*∨β*i*∨γ*i*) over variables x1,...,x*<sup>n</sup>* as the history of a counter over 2n+3 replicas. Besides the counter variables x*<sup>j</sup>* encoding propositional variables x*<sup>j</sup>* , the encoding adds a variable y encoding the number of initial increments and decrements, and a variable z to implement synchronization barriers.

of all pairs formed of write(x,v) and read(x) operations where v belongs to the return value of the read. By Retval , each read(x) operation must be associated to at least one write(x, ) operation. Also, the fact that each value is written at most once implies that this rf relation is uniquely defined, e.g., for LWW, it is not possible to find two write operations that could be rf related to the same read operation. In general, if there exists no rf relation satisfying these axioms, then ComputeRF returns a distinguished value ⊥ to signal a consistency violation. Note that the computation of the read-from for LWW and MVR is quadratic time<sup>4</sup> since the constraints imposed by the axioms relate only to the operation labels, the methods they invoke or their arguments. The case of RGA is slightly more involved because the axiom RetvalRGA introduces more readfrom constraints based on the happens-before order which includes ro and the rf itself. In this case, the computation of rf relies on a fixpoint computation, which converges in at most quadratic time (the maximal size of rf), described in Algorithm 2. Essentially, we use the axiom ReadFromRGA to populate the

<sup>4</sup> Assuming constant time lookup/insert operations (e.g., using hashmaps), this complexity is linear time.

**Input**: A differentiated history h = Op,ro and a datatype T. **Output**: *true* iff h satisfies the axioms of T. rf <sup>←</sup> ComputeRF(h,ReadFrom[T],Retval[T] ); **if** rf = ⊥ **then return** *false*; hb ← (ro ∪ rf) +; **if** hb *is cyclic or* h,*rf*, *hb* <sup>|</sup><sup>=</sup> ReadFromMaximal[T] <sup>∧</sup> ReadAllMaximals[T] **then return** *false*; lin ← hb; lin <sup>←</sup> LinClosure(hb,Lin[T]); **if** lin *is cyclic* **then return** *false*; **return** *true*;

**Algorithm 1.** Consistency checking for RGA, LWW, and MVR. Re. . . [T] refers to an axiom of T, or *true* when T lacks such an axiom. The relation R<sup>+</sup> denotes the transitive closure of R.

read-from relation and then, apply the axiom RetvalRGA iteratively, using the read-from constraints added in previous steps, until the computation converges.

After computing the read-from relation, our algorithm defines the happensbefore relation hb as the transitive closure of ro union rf. This is sound because none of the axioms of these datatypes enforce new happens-before constraints, which are not already captured by ro and rf. Then, it checks whether the hb defined this way is acyclic and satisfies the datatype's axioms that constrain hb, i.e., ReadFromMaximal and ReadAllMaximals (when they are present).

Finally, in the case of LWW and RGA, the algorithm computes a (partial) linearization order that satisfies the corresponding Lin axioms. Starting from an initial linearization order which is exactly the happens-before, it computes new constraints by instantiating the universally quantified axioms LinLWW and LinRGA. Since these axioms are not "recursive", i.e., they don't enforce linearization order constraints based on other linearization order constraints, a standard instantiation of these axioms is enough to compute a partial linearization order such that any extension to a total order satisfies the datatype's axioms.

**Theorem 3.** *Algorithm 1 returns true iff the input history is consistent.*

The following holds because Algorithm 1 runs in polynomial time — the rank depends on the number of quantifiers in the datatype's axioms. Indeed, Algorithm 1 represents a least fixpoint computation which converges in at most a quadratic number of iterations (the maximal size of rf).

**Corollary 1.** *The admissibility problem is polynomial time for RGA, and for LWW and MVR on differentiated histories.*

**Input**: A history h = Op,ro of RGA. **Output**: An rf satisfying ReadFromRGA <sup>∧</sup> RetvalRGA, if exists; <sup>⊥</sup> o/w **<sup>1</sup>** rf ← {(o1, o2) : meth(o1) = addAfter, meth(o2) ∈ {addAfter,remove,read}, arg2(o1) = arg1(o2) ∨ arg2(o1) ∈ ret(o2)}; **<sup>2</sup> if** h,rf <sup>|</sup><sup>=</sup> ReadFromRGA **then return** <sup>⊥</sup> ; **3 while** *true* **do <sup>4</sup>** rf<sup>1</sup> ← ∅; **<sup>5</sup> foreach** o1, o<sup>2</sup> ∈ Op s.t. o2, o1 ∈ (rf ∪ ro) <sup>+</sup> and meth(o1) = read and meth(o2) = addAfter and arg2(o2) ∈ ret(o1) **do <sup>6</sup> if** ∃o<sup>3</sup> ∈ Op s.t. meth(o3) = remove and arg(o3) = arg2(o2) **then <sup>7</sup>** rf<sup>1</sup> ← rf<sup>1</sup> ∪ {o3, o1}; **8 else <sup>9</sup> return** ⊥; **<sup>10</sup> if** rf<sup>1</sup> ⊆ rf **then break**; **<sup>11</sup> else** rf ← rf ∪ rf1; **12 return** rf;

**Algorithm 2.** The procedure ComputeRF for RGA.

#### **5 Polynomial-Time Algorithms for Replicated Counters**

In this section, we show that checking consistency for the replicated counter datatype becomes polynomial time assuming the number of replicas in the input history is fixed (i.e., the width of the replica order ro is fixed). We present an algorithm which constructs a valid happens-before order (note that the semantics of the replicated counter doesn't constrain the linearization order) incrementally, following the replica order. At any time, the happens-before order is uniquely determined by a *prefix mapping* that associates to each replica a *prefix* of the history, i.e., a set of operations which is downward-closed w.r.t. replica order (i.e., if it contains an operation it contains all its ro predecessors). This models the fact that the replica order is included in the happens-before and therefore, if an operation o<sup>1</sup> happens-before another operation o2, then all the ro predecessors of o<sup>1</sup> happenbefore o2. The happens-before order can be extended in two ways: (1) adding an operation issued on the replica i to the prefix of replica i, or (2) "merging" the prefix of a replica j to the prefix of a replica i (this models the delivery of an operation issued on replica j and all its happens-before predecessors to the replica i). Verifying that an extension of the happens-before is valid, i.e., that the return values of newly-added read operations satisfy the RetvalCounter axiom, doesn't depend on the happens-before order between the operations in the prefix associated to some replica (it is enough to count the inc and dec operations in that prefix). Therefore, the algorithm can be seen as a search in the space of prefix mappings. If the number of replicas in the input history is fixed, then the number of possible prefix mappings is polynomial in the size of the history, which implies that the search can be done in polynomial time.

Let h = (Op,ro) be a history. To simplify the notations, we assume that the replica order is a union of sequences, each sequence representing the operations **Input**: History h = (Op,ro), prefix map m, and set *seen* of invalid prefix maps **Output**: *true* iff there exists read-from and happens-before relations rf and hb such that m ⊆ hb, and h,rf, hb satisfies the counter axioms.

```
1 if m is complete then return true;
2 foreach replica i do
3 foreach replica j 	= i do
4 m-
            ← m[i ← m(i) ∪ m(j)];
5 if m-
              	∈ seen and checkCounter(h, m-

                                        , seen) then
6 return true;
7 seen ← seen ∪ {m-

                         };
8 if ∃o1. ro1(lasti(m), o1) then
9 if meth(o1) = read and
         arg(o1) = x ∧ ret(o1) 	= |{o ∈ m[i]|o = inc(x)}| − |{o ∈ m[i]|o = dec(x)}|
         then
10 return false;
11 m-
            ← m[i ← m(i) ∪ {o1}];
12 if m-
              	∈ seen and checkCounter(h, m-

                                        , seen) then
13 return true;
14 seen ← seen ∪ {m-

                         };
15 return false;
```
**Algorithm 3.** The procedure checkCounter, where ro<sup>1</sup> denotes immediate ro-successor, and <sup>f</sup>[<sup>a</sup> <sup>←</sup> <sup>b</sup>] updates function <sup>f</sup> with mapping <sup>a</sup> → <sup>b</sup>.

issued on the same replica. Therefore, each operation <sup>o</sup> <sup>∈</sup> Op is associated with a replica identifier rep(o) <sup>∈</sup> [1..n*h*], where <sup>n</sup>*<sup>h</sup>* is the number of replicas in <sup>h</sup>.

<sup>A</sup> *prefix* of <sup>h</sup> is a set of operation Op <sup>⊆</sup> Op such that all the ro predecessors of operations in Op are also in Op , i.e., <sup>∀</sup><sup>o</sup> <sup>∈</sup> Op. ro−<sup>1</sup>(o) <sup>∈</sup> Op. Note that the union of two prefixes of h is also a prefix of h. The *last operation* of replica i in a prefix Op is the ro-maximal operation o with rep(o) = i included in Op . A prefix Op is called *valid* if (Op ,ro ), where ro is the projection of ro on Op , is admitted by the replicated counter.

A *prefix map* is a mapping m which associates a prefix of h to each replica <sup>i</sup> <sup>∈</sup> [1..n*h*]. Intuitively, a prefix map defines for each replica <sup>i</sup> the set of operations which are "known" to i, i.e., happen-before the last operation of i in its prefix. Formally, a prefix map m is *included* in a happens-before relation hb, denoted by <sup>m</sup> <sup>⊆</sup> hb, if for each replica <sup>i</sup> <sup>∈</sup> [1..n*h*], hb(o, o*i*) for each operation in <sup>o</sup> <sup>∈</sup> <sup>m</sup>(i) \ {o*i*}, where <sup>o</sup>*<sup>i</sup>* is the last operation of <sup>i</sup> in <sup>m</sup>(i). We call <sup>o</sup>*<sup>i</sup>* the *last operation* of i in m, and denoted it by last*i*(m). A prefix map m is *valid* if it associates a valid prefix to each replica, and *complete* if it associates the whole history h to each replica i.

Algorithm 3 lists our algorithm for checking consistency of replicated counter histories. It is defined as a recursive procedure checkCounter that searches for a sequence of valid extensions of a given prefix map (initially, this prefix map is empty) until it becomes complete. The axiom RetvalCounter is enforced whenever extending the prefix map with a new read operation (when the last operation of a replica i is "advanced" to a read operation). The following theorem states of the correctness of the algorithm.

**Theorem 4.** checkCounter(h, <sup>∅</sup>, <sup>∅</sup>) *returns true iff the input history is consistent.*

When the number of replicas is fixed, the number of prefix maps becomes polynomial in the size of the history. This follows from the fact that prefixes are uniquely defined by their ro-maximal operations, whose number is fixed.

**Corollary 2.** *The admissibility problem for replicated counters is polynomialtime when the number of replicas is fixed.*

#### **6 Polynomial-Time Algorithms for Sets and Flags**

While Theorem 1 shows that the admissibility problem is NP-complete for replicated sets and flags even if the number of replicas is fixed, we show that this problem becomes polynomial time when additionally, the number of values added to the set, or the number of flags, is also fixed. Note that this doesn't limit the number of operations in the input history which can still be arbitrarily large. In the following, we focus on the Add-Wins Set, the other cases being very similar.

We propose an algorithm for checking consistency which is actually an extension of the one presented in Sect. 5 for replicated counters. The additional complexity in checking consistency for the Add-Wins Set comes from the validity of contains(x) return values which requires identifying the maximal predecessors in the happens-before relation that add or remove x (which are not necessarily the maximal hb-predecessors all-together). In the case of counters, it was enough just to count happens-before predecessors. Therefore, we extend the algorithm for replicated counters such that along with the prefix map, we also keep track of the hb-maximal add(x) and remove(x) operations for each element x and each replica i. When extending a prefix map with a contains operation, these hb-maximal operations (which define a witness for the read-from relation) are enough to verify the RetValSet axiom. Extending the prefix of a replica with an add or remove operation (issued on the same replica), or by merging the prefix of another replica, may require an update of these hb-maximal predecessors.

When the number of replicas and elements are fixed, the number of readfrom maps is polynomial in the size of the history — recall that the number of operations associated by a read-from map to a replica and set element is bounded by the number of replicas. Combined with the number of prefix maps being polynomial when the number of replicas is fixed, we obtain the following result.

**Theorem 5.** *Checking whether a history is admitted by the Add-Wins Set, Remove-Wins Set, Enable-Wins Flag, or the Disable-Wins Flag is polynomial time provided that the number of replicas and elements/flags is fixed.*

#### **7 Related Work**

Many have considered consistency models applicable to CRDTs, including causal consistency [26], sequential consistency [27], linearizability [24], session consistency [35], eventual consistency [36], and happens-before consistency [29]. Burckhardt et al. [8,11] propose a unifying framework to formalize these models. Many have also studied the complexity of verifying data-type agnostic notions of consistency, including serializability, sequential consistency and linearizability [1,2,4,18,20,22,30], as well as causal consistency [6]. Our definition of the replicated LWW register corresponds to the notion of causal convergence in [6]. This work studies the complexity of the admissibility problem for the replicated LWW register. It shows that this problem is NP-complete in general and polynomial time when each value is written only once. Our NP-completeness result is stronger since it assumes a fixed number of replicas, and our algorithm for the case of unique values is more general and can be applied uniformly to MVR and RGA. While Bouajjani et al. [5,14] consider the complexity for individual linearizable collection types, we are the first to establish (in)tractability of individual replicated data types. Others have developed effective consistency checking algorithms for sequential consistency [3,9,23,31], serializability [12,17,18,21], linearizability [10,16,28,37], and even weaker notions like eventual consistency [7] and sequential happens-before consistency [13,15]. In contrast, we are the first to establish precise polynomial-time algorithms for runtime verification of replicated data types.

#### **8 Conclusion**

By developing novel logical characterizations of replicated data types, reductions from propositional satisfiability checking, and tractable algorithms, we have established a frontier of tractability for checking consistency of replicated data types. As far as we are aware, our results are the first to characterize the asymptotic complexity consistency checking for CRDTs.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Communication-Closed Asynchronous Protocols**

Andrei Damian<sup>1</sup>, Cezara Dr˘agoi<sup>2</sup>, Alexandru Militaru<sup>1</sup>, and Josef Widder3,4(B)

 Politehnica University Bucharest, Bucharest, Romania Inria, ENS, CNRS, PSL, Paris, France TU Wien, Vienna, Austria widder@forsyte.at Interchain Foundation, Baar, Switzerland

**Abstract.** The verification of asynchronous fault-tolerant distributed systems is challenging due to unboundedly many interleavings and network failures (e.g., processes crash or message loss). We propose a method that reduces the verification of asynchronous fault-tolerant protocols to the verification of round-based synchronous ones. Synchronous protocols are easier to verify due to fewer interleavings, bounded message buffers etc. We implemented our reduction method and applied it to several state machine replication and consensus algorithms. The resulting synchronous protocols are verified using existing deductive verification methods.

#### **1 Introduction**

Fault tolerance protocols provide dependable services on top of unreliable computers and networks. One distinguishes asynchronous vs. synchronous protocols based on the semantics of parallel composition. Asynchronous protocols are crucial parts of many distributed systems for their better performance when compared against the synchronous ones. However, their correctness is very hard to obtain, due to the challenges of concurrency, faults, buffered message queues, and message loss and re-ordering at the network [5,19,21,26,31,35,37,42]. In contrast, reasoning about synchronous round-based semantics is simpler, as one only has to consider specific global states at round boundaries [1,8,10,11,13,17,29,32,40].

The question we address is how to connect both worlds, in order to exploit the advantage of verification in synchronous semantics when reasoning about asynchronous protocols. We consider asynchronous protocols that work in unreliable networks, which may lose and reorder messages, and where processes may crash. We focus on a class of protocols that solve state machine replication.

Due to the absence of a global clock, fault tolerance protocols implement an abstract notion of time to coordinate. The local state of a process maintains the

Supported by: Austrian Science Fund (FWF) via NFN RiSE (S11405) and project PRAVDA (P27722); WWTF grant APALACHE (ICT15-103); French National Research Agency ANR project SAFTA (12744-ANR-17-CE25-0008-01).

value of the abstract time (potentially implicit), and a process timestamps the messages it sends accordingly. Synchronous algorithms do not need to implement an abstract notion of time: it is embedded in the definition of any synchronous computational model [9,15,18,28], and it is called the *round number*. The key insight of our results is the existence of a correspondence between values of the abstract clock in the asynchronous systems and round numbers in the synchronous ones. Using this correspondence, we make explicit the "hidden" round-based synchronous structure of an asynchronous algorithm.

**Fig. 1.** Asynchronous executions without jumps

**Fig. 2.** Asynchronous executions with jumps

We discuss our approach using a leader election algorithm. We consider n of processes, which periodically elect collectively a new leader. These periods are called *ballots*, and in each ballot at most one leader should be elected. The protocol in Fig. 3 solves leader election. In a ballot, a process that wants to become leader proposes itself by sending a message containing its identifier me to all, and it is elected if (1) a majority of processes receive its message, (2) these receivers send a message of leadership acknowledgment to the entire network, and (3) at least one processes receives leadership acknowledgments for its leader estimate from a majority of processes. Figure 1(b) sketches an execution where process P3 fails to be elected in ballot 1 because the network drops all the messages sent by P3 marked with a cross. All processes timeout and there is no leader elected in ballot 1. In the second ballot, P2 tries to become leader, the network delivers all messages between P1 and P2 in time, the two processes form a majority, and P2 is elected leader of ballot 2.

The protocol is defined by the asynchronous parallel composition of n copies of the code in Fig. 3. Each process executes a loop, where each iteration defines the executors behavior in a ballot. The variable ballot encodes the ballot number. The function coord() provides a local estimate whether a process should try to become leader. Multiple processes may be selected by coord() as leader

**Fig. 3.** Control flow graph of asynchronous leader election. (Color figure online)

candidates, resulting in a race which is won by a process that is acknowledged by a majority (more than n/2 processes). Depending on the result of coord(), a process may take the leader branch on the left or the follower branch on the right. On the leader branch, a message is prepared and sent, at line 7. The message contains the ballot number, the label NewBallot, the leaders identity. On the other branch, a follower waits for a message from a process, which proposes itself for the current ballot number of the follower. This waiting is implemented by a loop, which terminates either on timeout or when a message is received. Next, the followers, which received a message, and the leader candidates send their leader estimate to all at lines 12 and 41, where the message contains the ballots number, the label AckBallot, and the leaders identity. If a processes receives more than n/2 messages labeled with AckBallot and its current ballot, it checks using all same(mbox, leader) in lines 22 and 49, whether a majority of processes acknowledges the leadership of its estimate. In this case, it adds this information to the array log (which stores the locally elected leader of each ballot, if any) and outputs it, before it empties its mailbox and continues with the next iteration.

Figure 1(a) shows another execution of this protocol. Again, P3 sends NewBallot messages for ballot 1 to all processes. P3's NewBallot messages are delayed, and P2 times out in ballot 1, moving to ballot 2 where it is a leader candidate. The messages sent in ballot 2 are exchanged like in Fig. 1(b). Contrary to Fig. 1(b), while exchanging ballot 2 messages, the network delivers to P2, P3's NewBallot message from ballot 1. However, P2 ignores it, because of the receive statement in line 14 that only accepts messages for greater or equal (ballot, label) pairs. The message from ballot 1 arrived too "late" because P2 already is in ballot 2. Thus, the messages from ballot 1 have the same effect as if they were dropped, as in Fig. 1(b). The executions are equivalent from the local perspective of the processes: By applying a "rubber band transformation" [30], one can reorder transitions, while maintaining the local control flow and the send/receive causality.

Another case of equivalent executions is given in Fig. 2. While P1 and P2 made progress, P3 was disconnected. In Fig. 2(a), while P3 is waiting for ballot 1 messages, the networks delivers a message for ballot 20. P3 receives this message in line 29 and updates ballot in line 35. P3 thus "jumps forward in time", acknowledging P2's leadership in ballot 20. In Fig. 2(b), P3's timeout expires in all ballots from 1 to 19, without P3 receiving any messages. Thus, it does not change its local state (except the ballot number) in these ballots. For P3, these two executions are stutter equivalent. Reducing verification to verification of executions as the ones to the right — i.e., *synchronous* executions — reduces the number of interleavings and drastically simplifies verification. In the following we discuss conditions on the code that allow such a reduction.

*Communication Closure.* In our example, the variables ballot and label encode abstract time: Let b and be their assigned values. Then abstract time ranges over T = {(b, -): <sup>b</sup> <sup>∈</sup> <sup>N</sup>, - ∈ {NewBallot, AckBallot}}. We fix NewBallot to be less than AckBallot, and consider the lexicographical order over T . The sequence of (b, -) induced by an execution at a process is monotonically increasing; thus (b, -) encodes a notion of time. A protocol is *communication-closed* if (i) each process sends only messages timestamped with the current time, and (ii) each process receives only messages timestamped with the current or a higher time value. For such protocols we show in Sect. 5 that for each asynchronous execution, there is an equivalent (processes go through the same sequence of local states) synchronous one. We use ideas from [17], but we allow reacting to future messages, which is a more permissive form of communication closure. This is essential for jumping forward, and thus for liveness in fault tolerance protocols.

The challenge is to check communication closure at the code level. For this, we rely on user-provided "tag" annotations that specify the variables and the message fields representing local time and timestamps. A system of assertions formalizes that the user-provided annotations encode time and that the protocol is communication-closed w.r.t. this definition of time. In the example, the user provides (ballot, label) for local time and msg->bal and msg->lab for timestamps. In Fig. 3, we give example assertions that we add for the send and receive conditions (i) and (ii). These assertions only consider the local state, i.e., we do not need to capture the states of other processes or the message pool. We check the assertions with the static verifier Verifast [22].

*Synchronous Semantics.* Central to our approach is re-writing communicationclosed asynchronous protocol into synchronous ones. To formalize synchronous semantics we introduce *multi Heard-Of protocols*, mHO for short. An mHO computation is structured into a sequence of mHO-rounds that execute synchronously. Figure 4 is an example of an mHO protocol. It has two mHO-rounds: NewBallot and AckBallot. Within a round, SEND functions, resp. UPDATE functions, are executed synchronously across all processes. The *round* number r is initially 0 and it is incremented after each execution of an mHO-round. The interesting feature, which models faults and timeouts, are the heard-of sets *HO* [9]. For each round r and each process p, the set *HO*(p, r) contains the set of processes from which p hears of in round r, i.e., whose messages are in the mailbox set taken as parameter by UPDATE (mbox). If the message from q to p is lost in round r, then q ∈ *HO*(p, r). Figures 1(b) and 2(b) are examples of executions of the protocol in Fig. 4. We extend the HO model [9] by allowing composition of *multiple* protocols. Verification in synchronous semantics, and thus in mHO, is simpler due to the round structure, which entails (i) no interleavings, (ii) no message buffers, and (iii) simpler invariants at the round boundaries.

**Fig. 4.** Control flow graph of synchronous leader election. (Color figure online)

*Rewriting to mHO.* We introduce a procedure that takes as input the asynchronous protocol together with tag annotations that have been checked, and produces the protocol rewritten in mHO, e.g., Fig. 3 is rewritten into Fig. 4. The rewriting is based on the idea of matching abstract time (ballot, label) to mHO round numbers r. Roughly, mHO-round NewBallot is obtained by combining the code of the first box on each path in Fig. 3 (the red boxes) and AckBallot is obtained my combining the second box on each path (the blue ones) as follows. The three message reception loops (the code in the boxes with highlighted background) are removed, because receptions are implicit in mHO; they correspond to a non-deterministic parameter of the UPDATE function. For each round, we record the context in which it is executed, e.g., the lower box for the follower is executed only if a NewBallot message was received (more details in Sect. 6).

*Verification.* The specification of the running example is that if two processes find the leader election for a ballot b successful (i.e., there is log entry for b), then they agree on the leader. In general, to prove the specification, we need invariants that quantify over the ballot number b. As processes decide asynchronously, the proof of ballot 1, for some process p, must refer to the first entry of log of processes that might already be in ballot 400. As discussed in [38], in general invariants need to capture the complete message history and the complete local state of processes. The proof of the same property for the synchronous protocol requires no such invariant. Due to communication closure, no messages need to be maintained after a round terminated, that is, there is no message pool. The rewritten synchronous code has a simpler correctness proof, independent of the chosen verification method. One could use model checking [1,29,39,40], theorem prover approaches [8,11], or deductive verification [14] for synchronous systems.

For several protocols, we formalize their specification in Consensus Logic [13], we have computed the equivalent mHO protocol, and proved it correct using the existing deductive verification engine from [13].

#### **2 Asynchronous Protocols**

All processes execute the same code, written in the core language in Fig. 5. The communication between processes is done via typed messages. Message payloads, denoted M, are wrappers of primitive or composite type. We denote by M the set of message types. Wrappers are used to distinguish payload types. Send instructions take as input an object of some payload type and the receivers identity or corresponding to a send to all. Receives statements are non-blocking, and return an object of payload type or NULL. Receive statements are parameterized by conditions (i.e., pointers to function) on the values in the received messages (e.g., timestamp). At most one message is received at a time. If no message has been delivered or satisfies the condition, receive returns NULL. In Fig. 3, we give the definition of the function eq, used to filter messages acknowledging the leadership of a process. The followers use also geq that checks if the received message is timestamped with a value higher or equal to the local time. We assume that each loop contains at least one send or receive statement. The iterative sequential computations are done in local functions, i.e., f(#»<sup>e</sup> ). The instructions in() and out() are used to communicate with an external environment.

The semantics of a protocol P is the asynchronous parallel composition of n copies of the same code, one copy per process, where n is a parameter. Formally, the state of a protocol P is a tuple s, msg where: s ∈ [P → (Vars∪ {pc}) → D] is a valuation in some data domain D of the variables in P, where pc is represents the current control location, where Loc is the set of all protocol locations, and msg ⊆ - <sup>M</sup>∈M(<sup>P</sup> × D(M) <sup>×</sup> <sup>P</sup>) is the multiset of messages in transit (the network may lose and reorder messages). Given a process p ∈ P, s(p) is the local state of p, which is a valuation of p's local variables, i.e., s(p) ∈ Vars*<sup>p</sup>* ∪ {pc*p*}→D. The state of a crashed process is a wildcard state that matches any state. The messages sent by a process are added to the global pool of messages msg, and


**Fig. 5.** Syntax of asynchronous protocols.

a receive statement removes a messages from the pool. The interface operations in and out do not modify the local state of a process. An execution is an infinite sequence s0 A0 s1 A1 . . . such that ∀i ≥ 0, si is a protocol state, Ai ∈ A is a local statement, whose execution creates a transition of the form s, msg *I,O* −→ s , msg where {I,O} are the observable events generated by the Ai (if any). We denote by [[P]] the set of executions of the protocol P.

#### **3 Round-Based Model: mHO**

*Intra-procedural.* mHO captures round-based distributed algorithms and is a reformulation of the model in [9]. All processes execute the same code and the computation is structured in rounds. We denote by P the set of processes and n = |P| is a parameter. The central concept is the *HO*-set, where *HO*(p, r) contains the processes from which process p has *heard of* — has received messages from — in round r; this models faults and timeouts.

*Syntax.* An mHO protocol consists of variable declarations, Vars is the set of variables, an initialization method init, and a non-empty sequence of rounds, called *phase*; cf. Fig. 6. A phase is a fixed-size array of rounds. Each round has a send and update method, parameterized by a type M (denoted by *round*M) which

$$\begin{array}{l} \textit{protool} ::= \textit{interface\ var}\\_{decl^\*} \; init\; phase\\ \textit{interface} ::= \textbf{in}; \; () \rightarrow type \; | \; \mathsf{out} \colon type \rightarrow ()\\ \textit{init} ::= \mathtt{init} \colon () \rightarrow [P \rightarrow \mathsf{Varss} \rightarrow \mathcal{D}]\\ \textit{phase} ::= \textit{round}^{+}\\ \textit{round} ::= \textbf{SEND:} \; [P \rightarrow \mathsf{Varss}] \rightarrow [P \rightarrow \mathsf{T}]\\ \textit{Update:} \; [P \rightarrow \mathsf{T}] \times [P \rightarrow \mathsf{Varss}]\\ \rightarrow [P \rightarrow \mathsf{Varss}] \end{array}$$

## **Fig. 6.** mHO syntax.

represents the message payload. The method SEND has no side effects and returns the messages to be sent based on the local state of each sender; it returns a partial map from receivers to payloads. The method UPDATE takes as input the received messages and updates the local state of a process. It may communicate with an external client via in and out. For data computations, UPDATE uses iterative control structures only indirectly via sequential functions, e.g., all same(mbox, leader) in Fig. 3, which checks whether the payloads of all messages in mbox are equal to the local leader estimate.

*Semantics.* The set of executions of a mHO protocol is defined by the execution in a loop, of SEND followed by UPDATE for each round in the phase array. The initial configuration is defined by init. There are three predefined execution counters: the phase number, which is increased after a phase has been executed, the step number which tracks which mHO-round is executed in the current phase, and the round number which counts the total number of rounds executed so far and is defined by the phase times the length of the phase array, plus the step.

A protocol state is a tuple SU, s, r, msg, P, *HO* where: P is the set of processes, SU ∈ {Send, Update} indicates the next transition, s ∈ [P → Vars → <sup>D</sup>] stores the process local states, <sup>r</sup> <sup>∈</sup> <sup>N</sup> is the round number, msg <sup>⊆</sup> <sup>2</sup>(*P,*D(M)*,P* ) stores the in-transit messages, where M is the type of the message payload, *HO* <sup>∈</sup> [<sup>P</sup> <sup>→</sup> <sup>2</sup>*<sup>P</sup>* ] evaluates the *HO*-sets for the current round. After the initialization, an execution alternates Send and Update transitions. In the Send transition, all processes send messages, which are added to a pool of messages msg, without modifying the local states. The values of the HO sets are updated non-deterministically to be a subset of P. A message is lost if the sender's identity does not belong to the HO set of the receiver. In an Update transition, UPDATE is applied at each process, taking as input the set of received messages by that process in that round. If the processes communicate with an external process, then UPDATE might produce observable events o*p*. These events correspond to calls to in, which returns an input value, and out that sends the value given as parameter to the client. At the end of the round, msg is purged and r is incremented. Figure 1(b) shows an execution of the mHO algorithm in Fig. 4.

*Inter-procedural.* The model introduced so far allows to express one protocol, e.g., a leader election protocol (e.g., Fig. 4). However, realistic systems typically combine several protocols, e.g., we can transform Fig. 4 into a replicated state machine protocol, by allowing processes to enter an atomic broadcast protocol in every ballot where a leader is elected successfully. Figure 7 sketches such an execution, where in the update of round AckBallot, a subprotocol is called; its execution is sketched with thicker edges. In the subprotocol, the leader broadcasts client requests in a loop until it loses its quorum. When a follower does not receive a message from the leader, it considers the leader crashed, and the control returns to the leader election protocol.

An inter-procedural mHO protocol differs from an intra-procedural one only in the UPDATE function: It may call another protocol and block until the call returns. An UPDATE may call at most one protocol on each path in its control flow (a sequence of calls can be implemented using multiple rounds). Thus, an inter-

**Fig. 7.** Inter-procedural execution

procedural mHO protocol is a collection of non-recursive mHO protocols, with a main protocol as entry point. Different protocols exchange messages of different types.

#### **4 Formalizing Communication Closure Using Tags**

We introduce synchronization tags which are program annotations that define communication-closed rounds within an asynchronous protocol.

**Definition 1 (Tag annotation).** *For a protocol* P*, a* tag annotation *is a tuple* (SyncV, tags, tagm, , D) *where:*


*The evaluation of a tag over* P*'s semantics is denoted* ([[tags]], [[tagm]])*, where*


*For every* 1 ≤ i ≤ m*,* v2*i*−<sup>1</sup> *is called a* phase tag *and* v2*<sup>i</sup> is called* step tag*. Given an execution* π ∈ [[P]]*, a transition* sAs *in* π *is* tagged *by* [[tagm]]*<sup>m</sup> if*

A *is* send(m) *or* m = recv(∗cond)*, or* A *is tagged by* [[tags]] *<sup>s</sup> otherwise.*

For Fig. 3, SyncV = (v1, v2), and tags matches v<sup>1</sup> and v<sup>2</sup> with ballot and label, resp., at all control locations, i.e., a process is in step NewBallot of phase 3, when ballot = 3 and label = NewBallot. For the type msg, tagm matches the field ballot and lab with v<sup>1</sup> and v2, resp., i.e., a message (3, NewBallot, 5) is a phase 3 step NewBallot message. To capture that messages of type A are sent locally before messages of type B, the tagging function tagm(B) should be defined on the same synchronization variables as tagm(A).

**Definition 2 (Synchronization tag).** *Given a protocol* P*, an annotation tag* (SyncV, tags, tagm, D, ) *is called* synchronization tag *iff:*

*(I.) for any local execution* π = s0A0s1A<sup>1</sup> ... ∈ [[P]]*<sup>p</sup> of a process* p*, the sequence* [[tags]]*<sup>s</sup>*<sup>0</sup> [[tags]]*<sup>s</sup>*<sup>1</sup> [[tags]]*<sup>s</sup>*<sup>2</sup> ... *is a monotonically increasing w.r.t. .*

*Moreover* ∀j, j ∈ [1..m],j < j . *if* [[tags]](2*j*−1*,*2*j*) *<sup>s</sup><sup>i</sup>* = [[tags]](2*j*−1*,*2*j*) *<sup>s</sup>i*+1 *and* [[tags]](2*j*- −1*,*2*j*- ) *<sup>s</sup><sup>i</sup>* = [[tags]](2*j*- −1*,*2*j*- ) *<sup>s</sup>i*+1 *then* [[tags]](2*j*- −1*,*2*j*- ) *<sup>s</sup>i*+1 = (⊥<sup>2</sup>*j*-<sup>−</sup><sup>1</sup>, ⊥<sup>2</sup>*j*- ) *where* [[tags]](2*j*−1*,*2*j*) *<sup>s</sup><sup>i</sup> is the projection of the tuple* [[tags]]*s<sup>i</sup> on the* <sup>2</sup><sup>j</sup> <sup>−</sup> <sup>1</sup> *and* 2j *components,*

	- *if* <sup>m</sup> <sup>=</sup> *NULL then* [[tags]]*<sup>s</sup>* [[tagm]]*m,* [[tags]]*<sup>s</sup>* = [[tags]]*sr, and*

$$-\text{ }if\ m = \text{ $MUL$ }\text{ }then\ s = sr,$$

#### *(IV.) for any local execution* <sup>π</sup> <sup>∈</sup> [[P]]*p, if* <sup>s</sup> *stm* −−→ s *is a transition of* π *such that*

*–* s = s *and* s |<sup>M</sup>*,*SyncV= s |<sup>M</sup>*,*SyncV*, that is, s and s' differ on the variables that are neither of some message type nor in the image of* tags*, – or* stm *is a* send, break, continue*, or* out()*,*

*then for all message type variables* m *in the protocol,* [[tags]]*<sup>s</sup>* = [[tagm]]*m, where* m *is the value in the state* s *of* m*, and for any* Mbox *variables of type set of messages,* [[tags]]*<sup>s</sup>* = [[tagm]]*<sup>m</sup> with* m ∈ [[Mbox]]*s,*

*(V.) for any local execution* π ∈ [[P]]*p, if* s<sup>1</sup> *send*(*m,* ) −−−−−−→ s<sup>2</sup> *stm*<sup>+</sup> → s<sup>3</sup> *send*(*m*- *,* ) −−−−−−−→ s<sup>4</sup> *or* s<sup>1</sup> *<sup>m</sup>*=*recv*(∗*cond*) −−−−−−−−−−→ <sup>s</sup><sup>2</sup> *stm*<sup>+</sup> → s<sup>3</sup> *send*(*m*- *,* ) −−−−−−−→ s<sup>4</sup> *are sequences of transitions in* π*, then* [[tagm]]*<sup>m</sup>* ≺ [[tagm]]*<sup>m</sup>*- *, where stm is any statement except send or recv. Moreover, if* s<sup>1</sup> *<sup>m</sup>*=*recv*(∗*cond*) −−−−−−−−−−→ <sup>s</sup><sup>2</sup> *stm*<sup>+</sup> → s<sup>3</sup> *m*- =*recv*(∗*cond*- ) −−−−−−−−−−−→ s<sup>4</sup> *in* π*, then* s<sup>2</sup> |Vars\(M∪SyncV)= s<sup>3</sup> |Vars\(M∪SyncV) *or* [[tags]]*<sup>s</sup>*<sup>2</sup> ≺ [[tags]]*<sup>s</sup>*<sup>3</sup> *.*

*A protocol* P *is communication-closed, if there exists a synchronization tag for* P*.*

Condition (I.) states that SyncV is not decreased by any local statement (it is a notion of time). Further, one synchronization pair is modified at a time, except a reset (i.e., a pair is set to its minimal value) when the value of a preceding pair is updated. Checking this, translates into checking a transition invariant, stating that the value of the synchronization tuple SyncV is increased by any assignment. To state this invariant we introduce "old synchronization variables" that maintain the value of the synchronization variables before the update.

Condition (II.) states that any message sent is tagged with a timestamp that equals the current local time. Checking it, reduces to an assert statement that expresses that for every v ∈ SyncV, tagm(M)(v) = tags(pc)(v), where M is the type of the message m which is sent, and pc is the program location of the send.

Condition (III.) states that any message received is tagged with a timestamp greater than or equal to the current time of the process. To check it, we need to consider the implementation of the functions passed as argument to a recv statement. These functions (e.g., eq and geq in Fig. 3) implement the filtering of the messages delivered by the network. We inline their code and prove Condition (III.) by comparing the tagged fields of message variables with the phase and step variables. In Fig. 3, assert m → bal == ballot && m → lab == label after recv(eq(ballot, label)) checks this condition on the leader's branch.

Condition (IV.) states that if the local state of a process changes (except changes of message type variables and synchronization variables), then all locally stored messages are timestamped with the current local time. That is, future messages cannot be "used" (no variable can be written, except message type variables) before the phase and step tags are updated to match the highest timestamp. To check it, we need to prove a stronger property than the one for (III.). At each control location that writes to either variables of primitive or composite type or mailbox variables, the values of the phase (and step) variables must be equal to the phase (and step) tagged fields of all allocated message type objects. In Fig. 3, the statement assert(equal(mbox, ballot, label)) checks this condition on the leader's branch. It is a separation logic formula that uses the inductive list definition of mbox which includes the content of the mbox.

The first four conditions imply that there is a global notion of time in the asynchronous protocol. However, this does not restrict the number of the messages exchanged between two processes with the same timestamp. mHO restricts the message exchange: for every time value (corresponding to a mHO-round), processes first send, then they receive messages, and then they perform a computation without receiving or sending more messages before time is increased. Condition (V.) ensures that the asynchronous protocol has this structure. We do a syntactic check of the code to ensure the code meets these restrictions.

Intuitively, each pair of synchronization variables identifies uniquely a mHOprotocol. To rewrite an asynchronous protocol into nested (inter-procedural) mHO-protocols, the tag of the inner protocol should include the tag of the outer one. The asynchronous code advances the time of one protocol at a time, that is, modifies one synchronization pair at a time. The only exception is when inner protocols terminate: in this case, the time of the outer protocol is advanced, while the time of the inner one is reset. Moreover, different protocols exchange different message types. To be able to order the messages exchanged by an inner protocol w.r.t. the messages exchanged by an outer protocol, the inner protocol messages should be tagged also with the synchronization variables identifying the outer one. This is actually happening in state machine replication algorithms, where the ballot (or view number), which is the tag of the outer leader election algorithm, tags also all the messages broadcast by the leader in the inner one.

#### **5 Reducing Asynchronous Executions**

We show that any execution of an asynchronous protocol that has a synchronization tag can be reduced to an indistinguishable mHO execution.

**Definition 3 (Indistinguishability).** *Given two executions* π *and* π *of a protocol* P*, we say a process* p *cannot distinguish locally between* π *and* π *w.r.t. a set of variables* <sup>W</sup>*, denoted* <sup>π</sup> *<sup>W</sup> <sup>p</sup>* π *, if the projection of both executions on the sequence of states of* p*, restricted to the variables in* W*, agree up to finite stuttering, denoted,* π*p,W* ≡ π *p,W .*

*Two executions* π *and* π *are* indistinguishable *w.r.t. a set of variables* W*, denoted* <sup>π</sup> *<sup>W</sup>* <sup>π</sup> *, iff no process can distinguish between them, i.e.,* <sup>∀</sup>p. π *<sup>W</sup> <sup>p</sup>* π *.*

The reduction preserves so-called local properties [7], among which are consensus and state machine replication.

**Definition 4 (Local properties).** *A property* φ *is* local *if for any two executions* a *and* b *that are indistinguishable* a |= φ *iff* b |= φ*.*

**Theorem 1.** *If there exists a synchronization tag* (SyncV, tags, tagm, D, ) *for* <sup>P</sup>*, then* <sup>∀</sup>*ae* <sup>∈</sup> [[P]] *there exists an mHO-execution se that is indistinguishable w.r.t. all variables except for* M *or* Set(M) *variables, therefore ae and se satisfy the same local properties.*

*Proof Sketch.* There are two cases to consider. Case (1): every receive transition <sup>s</sup> *<sup>m</sup>*=*recv*(∗*cond*) −−−−−−−−−−→ sr in ae satisfies that [[tags]]*sr* = [[tagm]]*m*, i.e., all messages received are timestamped with the current local tag of the receiver. We use commutativity arguments to reorder transitions so that we obtain an indistinguishable asynchronous execution in which the transition tags are globally non-decreasing: The interesting case is if a send comes before a lower tagged receive in ae. Then the tags of the two transitions imply that the transitions concern different messages so that swapping them cannot violate send/receive causality.

We exploit that in the protocols we consider, no correct process locally keeps the tags unchanged forever (e.g., stays in a ballot forever) to arrive at an execution where the subsequence of transitions with the same tag is finite. Still, the resulting execution is not an mHO execution; e.g., for the same tag a receive may happen before a send on a different process. Condition (V.) ensures that mHO send-receive-update order is respected locally at each process. From this, together with the observation that sends are left movers, and updates are right movers, we obtain a global send-receive-update order which implies that the resulting execution is a mHO execution.

Case (2): there is a transition <sup>s</sup> *<sup>m</sup>*=*recv*(∗*cond*) −−−−−−−−−−→ sr in ae such that [[tags]]*sr* <sup>≺</sup> [[tagm]]*m*, that is, a process receives a message with tag k , higher than its state tag k. In mHO, a process only receives for its current round. To bring the asynchronous execution in such a form, we use Condition (IV.) and mHO semantics, where each process goes through all rounds. First, Condition (IV.) ensures that the process must update the tag variables to k at some point t after receiving it, if it wants to use the content of the message. It ensures that the process stutters during the time instance between k and k , w.r.t. the values of the variables which are not of message type. That is, for the intermediate values of abstract time, between k and k , no messages are sent, received, and no computation is performed. We split ae at point t and add empty send instructions, receive instructions, and instructions that increment the synchronization variables, until the tag reaches k . If we do this for each jump in ae, we arrive at an indistinguishable asynchronous execution that falls into the Case (1).

#### **6 Rewriting of Asynchronous to mHO**

We introduce a rewriting algorithm that takes as input an asynchronous protocol P annotated with a synchronization tag and produces a mHO protocol whose executions are indistinguishable from the executions of P.

*Message Reception.* mHO receives all messages of a round at once, while in the asynchronous code, messages are received one by one. By Condition (V.), receive steps that belong to the same round are separated only by instructions that store the messages in the mailbox. We consider that message reception is implemented in a simple while(true) loop (the most inner one); cf. filled boxes in Fig. 3. Conditions (III.) and (IV.) ensure that all messages received in a loop belong to one round (the current one or the one the code will jump to after exiting the reception loop). Thus, we replace a reception loop by havoc and assume statements that subsume the possible effects of the loop, satisfying all the conditions regarding synchronization tags found in the original receive statements.

*Rewriting to an Intra-proceduralmHO*. When the synchronization tag is defined over a pair of variables, the rewriting will produce an intra-procedural mHO protocol. Recall that the values of synchronization variables incarnate the round number, so that each update to a pair of synchronization variables marks the beginning of a new mHO round. The difficulty is that different execution prefixes may lead to the same values of the synchronization variables. To compute mHOrounds, the algorithm exploits the position of the updates to the synchronization variables in the control flow graph (CFG). We consider different CFG patterns, from the simplest to the most complicated one.

**Fig. 8.** Control flow graphs for rewriting. (Color figure online)

*Case 1:* If the CFG is like in Fig. 8(a), i.e., it consists of one loop, where the phase tag ph is incremented once at the beginning of each loop iteration, and for every value of the step tag st there is exactly one assignment in the loop body (the same on all paths). In this case, the phase tag takes the same values as the loop iteration counter (maybe shifted with some initial value). Therefore, the loop body defines the code of an mHO-phase. It is easy to structure it into two mHO-rounds: the code of round A is the part of the CFG from the beginning of the loop's body up to the second assignment of the st variable, and round B is the rest of the code up to the end of the loop body.

*Case 2:* The CFG is like in Fig. 8(b). It differs from Case 1 in that the same value is assigned to st in different branches. Each of this assignments marks the beginning of a mHO round B, which thus has multiple entry points. In mHO, a round only has one entry point. To simulate the multiple entry points in mHO, we store in auxiliary variables the values of the conditions along the paths that led to the entry point. In the figure, the code of round A is given by the red box, and the code of round B by the condition in the first blue box, expressed on the auxiliary variable, followed by the respective branches in the blue box.

In our example in Fig. 3, the assignment label = AckBallot appears in the leader and the follower branch. Followers send and receive AckBallot messages only if they have received a NewBallot. The rewrite introduces old mbox1 in the mHO protocol in Fig. 4 to store this information. Also, we eliminate the variables ballot and label; they are subsumed by the phase and round number of mHO.

*Case 3:* Let us assume that the CFG is like in Fig. 8(c). It differs from Case 1 because the phase tag ph is assigned twice. We rewrite it into asynchronous code that falls into Case 1 or 2. The resulting CFG is sketched in Fig. 8(d), with only one assignment to ph at the beginning of the loop.

If the second assignment changes the value of ph, then there is a jump. In case of a jump, the beginning of a new phase does not coincide with the first instruction of the loop. Thus there might be multiple entry points for a phase. We introduce (non-deterministic) branching in the control flow to capture different entry points: In case there is no jump, the green followed by the purple edge are executed within the same phase. In case of a jump, the rewritten code allows the green and the purple paths to be executed in different phases; first the green, and then the purple in a later phase. We add empty loops to simulate the phases that are jumped over. As a pure non-deterministic choice at the top of the loop would be too imprecise, we use the variable jump to make sure that the purple edge is executed only once prior to green edge. In case of multiple assignments, we perform this transformation iteratively for each assignment.

The protocol in Fig. 4 is obtained using two optimizations of the previous construction: First we do not need empty loops. They are subsumed by the mHO semantics as all local state changes are caused by some message reception. Thus, an empty loop is simulated by the execution of a phase with empty HO sets. Second, instead of adding jump variables, we reuse the non-deterministic value of mbox. This is possible as the jump is preconditioned by a cardinality constraint on the mbox, and the green edge is empty (assignments to ballot and label correspond to ph++ and reception loops have been reduced to havoc statements).

*Nesting.* Cases 1–3 capture loops without nesting. Nested loops are rewritten into inter-procedural mHO protocols, using the structure of the tag annotations from Sect. 4. Each loop is rewritten into one protocol, starting with the most inner loop using the procedure above. For each outer loop, it first replaces the nested loop with a call to the computed mHO protocol, and then applies the same rewriting procedure. Interpreting each loop as a protocol is pessimistic, and our rewriting may generate deeper nesting than necessary. Inner loops appearing on different branches may belong to the same sub-protocol, so that these different loops exchange messages. If tags associates different synchronization variables to different loops then the rewriting builds one (sub-)protocol for each loop. Otherwise, the rewriting merges the loops into one mHO protocol. To soundly merge several loops into the same mHO protocol, the rewrite algorithm identifies the context in which the inner loop is executed.

**Theorem 2.** *Given an asynchronous protocol* P *annotated with a synchronization tag* (SyncV, tags, tagm, <sup>D</sup>, )*, the rewriting returns an inter-procedural mHO protocol* <sup>P</sup>*mHO whose executions are indistinguishable from the executions of* <sup>P</sup>*.*

#### **7 Experimental Results**

We implemented the rewriting procedure in a prototype tool ATHOS (https:// github.com/alexandrumc/async-to-sync-translation). We applied it to several fault-tolerant distributed protocols. Figure 9 summarizes our results.

*Verification of Synchronization Tags.* The tool takes protocols in a C embedding of the language from Sect. 2 as input. We use a C embedding to be able to use Verifast [22] for checking the conditions in Sect. 4, i.e., the communication closure of an asynchronous protocol. Verifast is a deductive verification tool based on separation logic for sequential programs. Therefore, communication closure is specified in separation logic in our tool. To reason about sending and receiving messages, we inline every recv(∗cond) and use predefined specifications for send and recv. We consider only the prototype and the specification of these functions.

The user specifies in a configuration file the synchronization tag by (i) defining the number of (nested) protocols, (ii) for each protocol, the phase and step variables, and (iii) for each messages type the fields that encode the timestamp, i.e., the phase and step number. Figure 9 gives the names of phase and step variables of our benchmarks. For now, we manually insert the specification to be proven, i.e., the assert statements that capture Conditions (I.) to (V.) in Sect. 4. In Fig. 9, column Async gives the size in LoC of the input asynchronous protocol, +CC gives the size in LoC of the input annotated with the checks for communication closure (Conditions (I.) to (V.)) and their proofs.


**Fig. 9.** Benchmarks. The superscript \* identifies protocols that jump over phases. The superscript V marks protocols whose synchronous counterpart we verified.

*Benchmarks.* Our tool has rewritten several challenging benchmarks: the algorithm from [6, Fig. 6] solves consensus using a failure detector. The algorithm jumps to a specific decision round, if a special decision message is received. Multi-Paxos is the Paxos algorithm from [25] over sequences, without fast paths, where the classic path is repeated as long as the leader is stable. Roughly, it does a leader election similar to our running example (NewBallot is *Phase1a*), except that the last all-to-all round is replaced by one back-and-forth communication between the leader and its quorum: the leader receives n/2 acknowledgments that contain also the log of its followers (*Phase1b*). The leader computes the maximal log and sends it to all (*Phase1aStart*). In a subprotocol, a stable leader accepts client requests, and broadcasts them one by one to its followers. The broadcast is implemented by three rounds, *Phase2aClassic*, *Phase2bClassic*, *Learn*, and is repeated as long as the leader is stable. ViewChange is a leader election algorithm similar to the one in ViewStamped [34]. Normal-Op is the subprotocol used in ViewStamped to implement the broadcasting of new commands by a stable leader. The last column of Fig. 9 gives the size of the mHO protocol computed by the rewriting. The implementation uses pycparser [3], to obtain the abstract syntax tree of the input protocol.

*Verification.* We verified the safety specification (agreement) of the mHO counterparts of the running example (Fig. 3), Normal-Op, and Multi-Paxos, by deductive verification: We encoded the specification of these algorithms, i.e., atomic broadcast, consensus, leader election, and the transition relation in Consensus Logic CL [13]. CL is a specification logic that allows us to express global properties of synchronous systems, and it contains expressions for processes, values, sets, cardinalities, and set comprehension. The verification conditions are soundly discarded by using an SMT solver. We used Z3 [33] in our experiments.

For Multi-Paxos we did a modular proof. First we prove the correctness of the sub-protocol Normal-Op which implements a loop of atomic broadcasts (executed in case of a stable leader). Then we prove the leader election outer loop correct, by replacing the subprotocol Normal-Op with its specification.

#### **8 Related Work and Conclusions**

Verification of asynchronous protocols received a lot of attention in the past years. Mechanized verification techniques like IronFleet [21] and Verdi [41] were the first to address verification of state machine replication. Later, Disel [38] proposes a logic to make the reasoning less protocol-specific, with the tradeoff of proofs that use the entire message history. At the other end of the spectrum, model checking based techniques [2,4,20,23,24] are fully automated but more restricted regarding the protocols they apply to. In between, semi-automated verification techniques based on deductive verification like natural proofs [12], Ivy [36], and PSync [14] try to minimize the user input for similar benchmarks.

We propose a technique that reduces the verification of an asynchronous protocol to a synchronous one, which simplifies the verification task no matter which method is chosen. We verified the resulting synchronous protocols with deductive verification based on [14]. Our technique uses the notion of communication closure [17], which we believe is the essence of any explicit or implicit synchrony in the system. We formalized a more general notion of communication closure that allows jumping over rounds, which is a catch-up mechanism essential to resynchronize and ensure liveness. Previous reduction techniques focus on shared memory systems [16,27], in contrast we focus on message passing concurrency.

The closest approaches are the results in [4,24] and [2,20], which also explore the synchrony of the system. Compared to these approaches, our technique allows more general behaviors, e.g., reasoning about stable leaders is possible because communication closure includes (for the first time) unbounded jumps. Also, we reduce to a stronger synchronous model, a round-based one instead of a peer to peer one, where interleavings w.r.t. actions of other rounds are removed.

As future work, we will address the relation between communication closure and specific network assumptions, e.g., FIFO channels, and a current limitation of communication closure which is reacting on messages from the past. For instance, recovery protocols react to such messages.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Verification and Invariants

## **Interpolating Strong Induction**

Hari Govind Vediramana Krishnan1(B), Yakir Vizel<sup>2</sup>, Vijay Ganesh<sup>1</sup>, and Arie Gurfinkel<sup>1</sup>

> <sup>1</sup> University of Waterloo, Waterloo, Canada hgvedira@uwaterloo.ca <sup>2</sup> The Technion, Haifa, Israel

**Abstract.** The principle of strong induction, also known as k-induction is one of the first techniques for unbounded SAT-based Model Checking (SMC). While elegant and simple to apply, properties as such are rarely k-inductive and when they can be strengthened, there is no effective strategy to guess the depth of induction. It has been mostly displaced by techniques that compute inductive strengthenings based on interpolation and property directed reachability (Pdr). In this paper, we present kAvy, an SMC algorithm that effectively uses <sup>k</sup>-induction to guide interpolation and Pdr-style inductive generalization. Unlike pure <sup>k</sup>-induction, kAvy uses Pdr-style generalization to compute and strengthen an inductive trace. Unlike pure Pdr, kAvy uses relative <sup>k</sup>-induction to construct an inductive invariant. The depth of induction is adjusted dynamically by minimizing a proof of unsatisfiability. We have implemented kAvy within the Avy Model Checker and evaluated it on HWMCC instances. Our results show that kAvy is more effective than both Avy and Pdr, and that using k-induction leads to faster running time and solving more instances. Further, on a class of benchmarks, called *shift*, kAvy is orders of magnitude faster than Avy, Pdr and <sup>k</sup>-induction.

#### **1 Introduction**

The principle of strong induction, also known as k-induction, is a generalization of (simple) induction that extends the base- and inductive-cases to k steps of a transition system [27]. A safety property P is k-inductive in a transition system <sup>T</sup> iff (a) <sup>P</sup> is true in the first (<sup>k</sup> <sup>−</sup> 1) steps of <sup>T</sup>, and (b) if <sup>P</sup> is assumed to hold for (<sup>k</sup> <sup>−</sup> 1) consecutive steps, then <sup>P</sup> holds in <sup>k</sup> steps of <sup>T</sup>. Simple induction is equivalent to 1-induction. Unlike induction, strong induction is complete for safety properties: a property P is safe in a transition system T iff there exists a natural number k such that P is k-inductive in T (assuming the usual restriction to simple paths). This makes k-induction a powerful method for unbounded SATbased Model Checking (SMC).

Unlike other SMC techniques, strong induction reduces model checking to pure SAT that does not require any additional features such as solving with assumptions [12], interpolation [24], resolution proofs [17], Maximal Unsatisfiable Subsets (MUS) [2], etc. It easily integrates with existing SAT-solvers and immediately benefits from any improvements in heuristics [22,23], preand in-processing [18], and parallel solving [1]. The simplicity of applying kinduction made it the go-to technique for SMT-based infinite-state model checking [9,11,19]. In that context, it is particularly effective in combination with invariant synthesis [14,20]. Moreover, for some theories, strong induction is strictly stronger than 1-induction [19]: there are properties that are k-inductive, but have no 1-inductive strengthening.

Notwithstanding all of its advantages, strong induction has been mostly displaced by more recent SMC techniques such as Interpolation [25], Property Directed Reachability [3,7,13,15], and their combinations [29]. In SMC kinduction is equivalent to induction: any k-inductive property P can be strengthened to an inductive property Q [6,16]. Even though in the worst case Q is exponentially larger than P [6], this is rarely observed in practice [26]. Furthermore, the SAT queries get very hard as k increases and usually succeed only for rather small values of k. A recent work [16] shows that strong induction can be integrated in Pdr. However, [16] argues that <sup>k</sup>-induction is hard to control in the context of Pdr since choosing a proper value of <sup>k</sup> is difficult. A wrong choice leads to a form of state enumeration. In [16], k is fixed to 5, and regular induction is used as soon as 5-induction fails.

In this paper, we present kAvy, an SMC algorithm that effectively uses <sup>k</sup>-induction to guide interpolation and Pdr-style inductive generalization. As many state-of-the-art SMC algorithms, kAvy iteratively constructs candidate inductive invariants for a given safety property P. However, the construction of these candidates is driven by k-induction. Whenever P is known to hold up to a bound <sup>N</sup>, kAvy searches for the smallest <sup>k</sup> <sup>≤</sup> <sup>N</sup> + 1, such that either <sup>P</sup> or some of its strengthening is k-inductive. Once it finds the right k and strengthening, it computes a 1-inductive strengthening.

It is convenient to think of modern SMC algorithms (e.g., Pdr and Avy), and k-induction, as two ends of a spectrum. On the one end, modern SMC algorithms fix k to 1 and *search* for a 1-inductive strengthening of P. While on the opposite end, k-induction fixes the strengthening of P to be P itself and *searches* for a k such that <sup>P</sup> is <sup>k</sup>-inductive. kAvy *dynamically* explores this spectrum, exploiting the interplay between finding the right k and finding the right strengthening.

As an example, consider a system in Fig. 1 that counts upto 64 and resets. The property, <sup>p</sup> : c < 66, is 2-inductive. IC3, Pdr and Avy iteratively guess a 1-inductive strengthening of p. In the worst case, they require at least 64 iterations. On the other hand, kAvy determines that <sup>p</sup> is 2-inductive after 2 iterations, *computes* a 1-inductive invariant (<sup>c</sup> = 65) <sup>∧</sup> (c < 66), and terminates.

$$\begin{array}{ll} \textbf{r} \textbf{e} \quad \{7:0\} & \textbf{c} & = & \textbf{0};\\ \textbf{a} \, \textbf{a} \, \textbf{y} \, \textbf{s} & & \\ \textbf{i} \, \{\, \textbf{c} & = & \textbf{64}\} \\ \textbf{c} & < & \textbf{0};\\ \textbf{e} \, \textbf{i} \, \textbf{e} & < & \textbf{c} \, + \, \textbf{1};\\ \textbf{e} \, \textbf{i} & < & \textbf{c} \, + \, \textbf{1};\\ \textbf{a} \, \textbf{s} \, \textbf{e} \, \textbf{r} & \textbf{p} \, \textbf{p} \, \textbf{e} \, \textbf{r} \, \textbf{y} \, \left( \begin{array}{ll} \textbf{c} & < & \textbf{66} \end{array} \right); \end{array}$$

**Fig. 1.** An example system.

kAvy builds upon the foundations of Avy [29]. Avy first uses Bounded Model Checking [4] (BMC) to prove that the property P holds up to bound <sup>N</sup>. Then, it uses a sequence interpolant [28] and Pdr-style inductivegeneralization [7] to construct 1-inductive strengthening candidate for P. We emphasize that using k-induction to construct 1-inductive candidates allows kAvy to efficiently utilize many principles from Pdr and Avy. While maintaining k-inductive candidates might seem attractive (since they may be smaller), they are also much harder to generalize effectively [7].

We implemented kAvy in the Avy Model Checker, and evaluated it on the benchmarks from the Hardware Model Checking Competition (HWMCC). Our experiments show that kAvy significantly improves the performance of Avy and solves more examples than either of Pdr and Avy. For a specific family of examples from [21], kAvy exhibits nearly constant time performance, compared to an exponential growth of Avy, Pdr, and <sup>k</sup>-induction (see Fig. 2b in Sect. 5). This further emphasizes the effectiveness of efficiently integrating strong induction into modern SMC.

The rest of the paper is structured as follows. After describing the most relevant related work, we present the necessary background in Sect. 2 and give an overview of SAT-based model checking algorithms in Sect. 3. kAvy is presented in Sect. 4, followed by presentation of results in Sect. 5. Finally, we conclude the paper in Sect. 6.

*Related Work.* kAvy builds on top of the ideas of IC3 [7] and Pdr [13]. The use of interpolation for generating an inductive trace is inspired by Avy [29]. While conceptually, our algorithm is similar to Avy, its proof of correctness is non-trivial and is significantly different from that of Avy. We are not aware of any other work that combines interpolation with strong induction.

There are two prior attempts enhancing Pdr-style algorithms with <sup>k</sup>induction. Pd-Kind [19] is an SMT-based Model Checking algorithm for infinitestate systems inspired by IC3/Pdr. It infers <sup>k</sup>-inductive invariants driven by the property whereas kAvy infers 1-inductive invariants driven by <sup>k</sup>-induction. Pd-Kind uses recursive blocking with interpolation and model-based projection to block bad states, and k-induction to propagate (push) lemmas to next level. While the algorithm is very interesting it is hard to adapt it to SAT-based setting (i.e. SMC), and impossible to compare on HWMCC instances directly.

The closest related work is KIC3 [16]. It modifies the counter example queue management strategy in IC3 to utilize <sup>k</sup>-induction during blocking. The main limitation is that the value for k must be chosen statically (k = 5 is reported for the evaluation). kAvy also utilizes <sup>k</sup>-induction during blocking but computes the value for k dynamically. Unfortunately, the implementation is not available publicly and we could not compare with it directly.

#### **2 Background**

In this section, we present notations and background that is required for the description of our algorithm.

*Safety Verification.* A symbolic transition system T is a tuple (¯v,*Init*, *Tr* , *Bad*), where ¯v is a set of Boolean *state* variables. A state of the system is a complete valuation to all variables in ¯<sup>v</sup> (i.e., the set of states is {0, <sup>1</sup>}|v¯<sup>|</sup> ). We write ¯v = {v <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>v</sup>¯} for the set of *primed* variables, used to represent the next state. *Init* and *Bad* are formulas over ¯v denoting the set of initial states and bad states, respectively, and *Tr* is a formula over ¯<sup>v</sup> <sup>∪</sup> <sup>v</sup>¯ , denoting the transition relation. With abuse of notation, we use formulas and the sets of states (or transitions) that they represent interchangeably. In addition, we sometimes use a state s to denote the formula (cube) that characterizes it. For a formula ϕ over ¯v, we use ϕ(¯v ), or <sup>ϕ</sup> in short, to denote the formula in which every occurrence of <sup>v</sup> <sup>∈</sup> <sup>v</sup>¯ is replaced by <sup>v</sup> <sup>∈</sup> <sup>v</sup>¯ . For simplicity of presentation, we assume that the property <sup>P</sup> <sup>=</sup> <sup>¬</sup>*Bad* is true in the initial state, that is *Init* <sup>⇒</sup> <sup>P</sup>.

Given a formula ϕ(¯v), an M-to-N-*unrolling* of T, where ϕ holds in all intermediate states is defined by the formula:

$$Tr[\varphi]\_M^N = \bigwedge\_{i=M}^{N-1} \varphi(\bar{v}\_i) \wedge Tr(\bar{v}\_i, \bar{v}\_{i+1}) \tag{1}$$

We write *Tr* [ϕ] <sup>N</sup> when M = 0 and *Tr* <sup>N</sup> <sup>M</sup> when <sup>ϕ</sup> <sup>=</sup> .

A transition system <sup>T</sup> is UNSAFE iff there exists a state <sup>s</sup> <sup>∈</sup> *Bad* s.t. <sup>s</sup> is reachable, and is SAFE otherwise. Equivalently, T is UNSAFE iff there exists a number N such that the following *unrolling* formula is satisfiable:

$$\operatorname{Init}(\bar{v}\_0) \land \operatorname{Tr}^N \land \operatorname{Bad}(\bar{v}\_N) \tag{2}$$

<sup>T</sup> is SAFE if no such <sup>N</sup> exists. Whenever <sup>T</sup> is UNSAFE and <sup>s</sup><sup>N</sup> <sup>∈</sup> *Bad* is a reachable state, the path from <sup>s</sup><sup>0</sup> <sup>∈</sup> *Init* to <sup>s</sup><sup>N</sup> is called a *counterexample*.

An *inductive invariant* is a formula *Inv* that satisfies:

$$Init(\bar{v}) \Rightarrow Inv(\bar{v}) \qquad \qquad Inv(\bar{v}) \land \, Tr(\bar{v}, \bar{v}') \Rightarrow Inv(\bar{v}') \tag{3}$$

A transition system T is SAFE iff there exists an inductive invariant *Inv* s.t. Inv(¯v) <sup>⇒</sup> <sup>P</sup>(¯v). In this case we say that *Inv* is a *safe* inductive invariant.

The *safety* verification problem is to decide whether a transition system T is SAFE or UNSAFE, i.e., whether there exists a safe inductive invariant or a counterexample.

*Strong Induction.* Strong induction (or k-induction) is a generalization of the notion of an inductive invariant that is similar to how "simple" induction is generalized in mathematics. A formula *Inv* is k*-invariant* in a transition system T if it is true in the first k steps of T. That is, the following formula is valid: *Init*(¯v0) <sup>∧</sup> *Tr* <sup>k</sup> <sup>⇒</sup> <sup>k</sup> <sup>i</sup>=0 *Inv*(¯vi) . A formula *Inv* is a k*-inductive invariant* iff *Inv* is a (<sup>k</sup> <sup>−</sup> 1)-invariant and is inductive after <sup>k</sup> steps of <sup>T</sup>, i.e., the following formula is valid: *Tr* [*Inv*] <sup>k</sup> <sup>⇒</sup> *Inv*(¯vk). Compared to simple induction, <sup>k</sup>-induction strengthens the hypothesis in the induction step: *Inv* is assumed to hold between steps 0 to <sup>k</sup>−1 and is established in step <sup>k</sup>. Whenever *Inv* <sup>⇒</sup> <sup>P</sup>, we say that *Inv* is a safe k-inductive invariant. An inductive invariant is a 1-inductive invariant.

**Theorem 1.** *Given a transition system* T*. There exists a safe inductive invariant w.r.t.* T *iff there exists a safe* k*-inductive invariant w.r.t.* T*.*

Theorem 1 states that k-induction principle is as complete as 1-induction. One direction is trivial (since we can take k = 1). The other can be strengthened further: for every k-inductive invariant *Inv* <sup>k</sup> there exists a 1-inductive strengthening *Inv* <sup>1</sup> such that *Inv* <sup>1</sup> ⇒ *Inv* <sup>k</sup>. Theoretically *Inv* <sup>1</sup> might be exponentially bigger than *Inv* <sup>k</sup> [6]. In practice, both invariants tend to be of similar size.

We say that a formula <sup>ϕ</sup> is <sup>k</sup>*-inductive relative* to <sup>F</sup> if it is a (k−1)-invariant and *Tr* [<sup>ϕ</sup> <sup>∧</sup> <sup>F</sup>] <sup>k</sup> <sup>⇒</sup> <sup>ϕ</sup>(¯vk).

*Craig Interpolation* [10]. We use an extension of Craig Interpolants to sequences, which is common in Model Checking. Let *<sup>A</sup>* = [A1,...,A<sup>N</sup> ] such that <sup>A</sup><sup>1</sup> ∧···∧ <sup>A</sup><sup>N</sup> is unsatisfiable. A *sequence interpolant <sup>I</sup>* <sup>=</sup> seqItp(*A*) for *<sup>A</sup>* is a sequence of formulas *<sup>I</sup>* = [I2,...,I<sup>N</sup> ] such that (a) <sup>A</sup><sup>1</sup> <sup>⇒</sup> <sup>I</sup>2, (b) <sup>∀</sup><sup>1</sup> <i<N · <sup>I</sup><sup>i</sup> <sup>∧</sup>A<sup>i</sup> <sup>⇒</sup> <sup>I</sup>i+1, (c) <sup>I</sup><sup>N</sup> <sup>∧</sup> <sup>A</sup><sup>N</sup> ⇒ ⊥, and (d) <sup>I</sup><sup>i</sup> is over variables that are shared between the corresponding prefix and suffix of *A*.

#### **3 SAT-Based Model Checking**

In this section, we give a brief overview of SAT-based Model Checking algorithms: IC3/Pdr [7,13], and Avy [29]. While these algorithms are well-known, we give a uniform presentation and establish notation necessary for the rest of the paper. We fix a symbolic transition system T = (¯v,*Init*, *Tr* , *Bad*).

The main data-structure of these algorithms is a sequence of candidate invariants, called an *inductive trace*. An *inductive trace*, or simply a trace, is a sequence of formulas *F* = [F0,...,F<sup>N</sup> ] that satisfy the following two properties:

$$Int(\bar{v}) = F\_0(\bar{v}) \qquad \forall 0 \le i < N \cdot F\_i(\bar{v}) \land Tr(\bar{v}, \bar{v}') \Rightarrow F\_{i+1}(\bar{v}') \tag{4}$$

An element F<sup>i</sup> of a trace is called a *frame*. The index of a frame is called a *level*. *F* is *clausal* when all its elements are in CNF. For convenience, we view a frame as a set of clauses, and assume that a trace is padded with until the required length. The *size* of *<sup>F</sup>* = [F0,...,F<sup>N</sup> ] is <sup>|</sup>*F*<sup>|</sup> <sup>=</sup> <sup>N</sup>. For <sup>k</sup> <sup>≤</sup> <sup>N</sup>, we write *F*<sup>k</sup> = [Fk,...,F<sup>N</sup> ] for the k-suffix of *F*.

A trace *<sup>F</sup>* of size <sup>N</sup> is *stronger* than a trace *<sup>G</sup>* of size <sup>M</sup> iff <sup>∀</sup><sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> min(N,M) · <sup>F</sup>i(¯v) <sup>⇒</sup> <sup>G</sup>i(¯v). A trace is *safe* if each <sup>F</sup><sup>i</sup> is safe: <sup>∀</sup><sup>i</sup> · <sup>F</sup><sup>i</sup> ⇒ ¬*Bad*; *monotone* if <sup>∀</sup><sup>0</sup> <sup>≤</sup> i<N · <sup>F</sup><sup>i</sup> <sup>⇒</sup> <sup>F</sup>i+1. In a monotone trace, a frame <sup>F</sup><sup>i</sup> overapproximates the set of states reachable in up to i steps of the *Tr* . A trace is closed if <sup>∃</sup><sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>N</sup> · <sup>F</sup><sup>i</sup> <sup>⇒</sup> <sup>i</sup>−<sup>1</sup> <sup>j</sup>=0 <sup>F</sup><sup>j</sup> .

We define an unrolling formula of a k-suffix of a trace *F* = [F0,...,F<sup>N</sup> ] as :

$$Tr[F^k] = \bigwedge\_{i=k}^{|F|} F\_i(\bar{v}\_i) \wedge Tr(\bar{v}\_i, \bar{v}\_{i+1}) \tag{5}$$

We write *Tr* [*F*] to denote an unrolling of a 0-suffix of *F* (i.e *F* itself). Intuitively, *Tr* [*F*k] is satisfiable iff there is a k-step execution of the *Tr* that is consistent with the <sup>k</sup>-suffix *<sup>F</sup>*k. If a transition system <sup>T</sup> admits a safe trace *<sup>F</sup>* of size <sup>|</sup>*F*<sup>|</sup> <sup>=</sup> <sup>N</sup>, then T does not admit counterexamples of length less than N. A safe trace *F*, with <sup>|</sup>*F*<sup>|</sup> <sup>=</sup> <sup>N</sup> is *extendable* with respect to level 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>N</sup> iff there exists a safe trace *<sup>G</sup>* stronger than *<sup>F</sup>* such that <sup>|</sup>*G*<sup>|</sup> > N and <sup>F</sup><sup>i</sup> <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>G</sup>i+1. *<sup>G</sup>* and the corresponding level i are called an *extension trace* and an *extension level* of *F*, respectively. SAT-based model checking algorithms work by iteratively extending a given safe trace *F* of size N to a safe trace of size N + 1.

An extension trace is not unique, but there is a largest extension level. We denote the set of all extension levels of *F* by W(*F*). The existence of an extension level i implies that an unrolling of the i-suffix does not contain any *Bad* states:

**Proposition 1.** *Let <sup>F</sup> be a safe trace. Then,* <sup>i</sup>*,* <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>N</sup>*, is an extension level of F iff the formula Tr* [*F*<sup>i</sup> ] <sup>∧</sup> *Bad*(¯vN+1) *is unsatisfiable.*

*Example 1.* For Fig. 1, *F* = [c = 0,c< 66] is a safe trace of size 1. The formula (c < 66) <sup>∧</sup> *Tr* ∧ ¬(c <sup>&</sup>lt; 66) is satisfiable. Therefore, there does not exists an extension trace at level 1. Since (<sup>c</sup> = 0) <sup>∧</sup> *Tr* <sup>∧</sup> (c <sup>&</sup>lt; 66) <sup>∧</sup> T r <sup>∧</sup> (c <sup>≥</sup> 66) is unsatisfiable, the trace is extendable at level 0. For example, a valid extension trace at level 0 is *G* = [c = 0,c< 2,c< 66].

Both Pdr and Avy iteratively extend a safe trace either until the extension is closed or a counterexample is found. However, they differ in how exactly the trace is extended. In the rest of this section, we present Avy and Pdr through the lens of extension level. The goal of this presentation is to make the paper selfcontained. We omit many important optimization details, and refer the reader to the original papers [7,13,29].

Pdr maintains a monotone, clausal trace *<sup>F</sup>* with *Init* as the first frame (F0). The trace *F* is extended by recursively computing and blocking (if possible) states that can reach *Bad* (called *bad states*). A bad state is blocked at the largest level possible. Algorithm <sup>1</sup> shows PdrBlock, the backward search procedure that identifies and blocks bad states. PdrBlock maintains a queue of states and the levels at which they have to be blocked. The smallest level at which blocking occurs is tracked in order to show the construction of the extension trace. For each state s in the queue, it is checked whether s can be blocked by the previous frame <sup>F</sup><sup>d</sup>−<sup>1</sup> (line 5). If not, a predecessor state <sup>t</sup> of <sup>s</sup> that satisfies <sup>F</sup><sup>d</sup>−<sup>1</sup> is computed and added to the queue (line 7). If a predecessor state is found at level 0, the trace is not extendable and an empty trace is returned. If the state <sup>s</sup> is blocked at level <sup>d</sup>, PdrIndGen, is called to generate a clause that blocks s and possibly others. The clause is then added to all the frames at levels less than or equal to <sup>d</sup>. PdrIndGen is a crucial optimization to Pdr. However, we do not explain it for the sake of simplicity. The procedure terminates whenever there are no more states to be blocked (or a counterexample was found at line 4). By construction, the output trace *G* is an extension trace of *F* at the extension level <sup>w</sup>. Once Pdr extends its trace, PdrPush is called to check if the clauses it learnt are also true at higher levels. Pdr terminates when the trace is closed.


Avy, shown in Algorithm 2, is an alternative to Pdr that combines interpolation and recursive blocking. Avy starts with a trace *<sup>F</sup>*, with <sup>F</sup><sup>0</sup> <sup>=</sup> *Init*, that is extended in every iteration of the main loop. A counterexample is returned whenever *F* is not extendable (line 3). Otherwise, a sequence interpolant is extracted from the unsatisfiability of *Tr* [*F* max(W) ] <sup>∧</sup> *Bad*(¯vN+1). A longer trace *G* = [G0,...,G<sup>N</sup> , GN+1] is constructed using the sequence interpolant (line 7). Observe that *G* is an extension trace of *F*. While *G* is safe, it is neither monotone nor clausal. A helper routine AvyMkTrace is used to convert *<sup>G</sup>* to a proper Pdr trace on line 8 (see [29] for the details on AvyMkTrace). Avy converges when the trace is closed.

#### **4 Interpolating** *k***-Induction**

In this section, we present kAvy, an SMC algorithm that uses the principle of strong induction to extend an inductive trace. The section is structured as follows. First, we introduce a concept of extending a trace using relative kinduction. Second, we present kAvy and describe the details of how <sup>k</sup>-induction is used to compute an extended trace. Third, we describe two techniques for computing maximal parameters to apply strong induction. Unless stated otherwise, we assume that all traces are monotone.

A safe trace *<sup>F</sup>*, with <sup>|</sup>*F*<sup>|</sup> <sup>=</sup> <sup>N</sup>, is *strongly extendable* with respect to (i, k), where 1 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>i</sup> + 1 <sup>≤</sup> <sup>N</sup> + 1, iff there exists a safe inductive trace *<sup>G</sup>* stronger than *<sup>F</sup>* such that <sup>|</sup>*G*<sup>|</sup> > N and *Tr* [Fi] <sup>k</sup> <sup>⇒</sup> <sup>G</sup>i+1. We refer to the pair (i, k) as *<sup>a</sup> strong extension level (SEL)*, and to the trace *G* as an (i, k)*-extension trace*, or simply a *strong extension trace (SET)* when (i, k) is not important. Note that for k = 1, *G* is just an extension trace.

*Example 2.* For Fig. 1, the trace *F* = [c = 0,c < 66] is strongly extendable at level 1. A valid (1, 2)-extension trace is *<sup>G</sup>* = [<sup>c</sup> = 0,(<sup>c</sup> = 65) <sup>∧</sup> (c < 66),c< 66]. Note that (c < 66) is 2-inductive relative to F1, i.e. *Tr* [F1] <sup>2</sup> <sup>⇒</sup> (c <sup>&</sup>lt; 66).

We write K(*F*) for the set of all SELs of *F*. We define an order on SELs by: (i1, k1) (i2, k2) iff (i) <sup>i</sup><sup>1</sup> < i2; or (ii) <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>2</sup> <sup>∧</sup> <sup>k</sup><sup>1</sup> > k2. The maximal SEL is max(K(*F*)).


**Input:** A transition system T = (*Init*, *Tr*, *Bad*) **Output:** safe/unsafe **<sup>1</sup>** *<sup>F</sup>* <sup>←</sup> [Init] ; <sup>N</sup> <sup>←</sup> <sup>0</sup> **2 repeat** // Invariant: *F* is a monotone, clausal, safe, inductive trace **<sup>3</sup>** <sup>U</sup> <sup>←</sup> *Tr*[*<sup>F</sup>* <sup>0</sup>] <sup>∧</sup> *Bad*(¯vN+1) **<sup>4</sup> if** isSat(U) **then return** unsafe **<sup>5</sup>** (i, k) <sup>←</sup> max{(i, k) | ¬isSat(*Tr*-*F* i <sup>k</sup> <sup>∧</sup> *Bad*(¯vN+1))} **<sup>6</sup>** [F0,...,FN+1] <sup>←</sup> kAvyExtend(*<sup>F</sup>* , (i, k)) **<sup>7</sup>** [F0,...,FN+1] <sup>←</sup> PdrPush([F0,...,FN+1]) **<sup>8</sup> if** <sup>∃</sup><sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>N</sup> · <sup>F</sup><sup>i</sup> <sup>⇒</sup> <sup>i</sup>−<sup>1</sup> <sup>j</sup>=0 F<sup>j</sup> **then return** safe **<sup>9</sup>** <sup>N</sup> <sup>←</sup> <sup>N</sup> + 1 **<sup>10</sup> until** ∞

Note that the existence of a SEL (i, k) means that an unrolling of the i-suffix with F<sup>i</sup> repeated k times does not contain any bad states. We use *Tr* -*F*i <sup>k</sup> to denote this *characteristic formula* for SEL (i, k):

$$\operatorname{Tr}\|\mathbf{F}^{i}\|^{k} = \begin{cases} \operatorname{Tr}[F\_{i}]\_{i+1-k}^{i+1} \wedge \operatorname{Tr}[\mathbf{F}^{i+1}] & \text{if } 0 \le i < N\\ \operatorname{Tr}[F\_{N}]\_{N+1-k}^{N+1} & \text{if } i = N \end{cases} \tag{6}$$

**Proposition 2.** *Let <sup>F</sup> be a safe trace, where* <sup>|</sup>*F*<sup>|</sup> <sup>=</sup> <sup>N</sup>*. Then,* (i, k)*,* <sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>i</sup>+1 <sup>≤</sup> <sup>N</sup>+1*, is an SEL of <sup>F</sup> iff the formula Tr* -*F*i <sup>k</sup>∧*Bad*(¯vN+1) *is unsatisfiable.*

The level i in the maximal SEL (i, k) of a given trace *F* is greater or equal to the maximal extension level of *F*:

**Lemma 1.** *Let* (i, k) = max(K(*F*))*, then* <sup>i</sup> <sup>≥</sup> max(W(*F*))*.*

Hence, extensions based on maximal SEL are constructed from frames at higher level compared to extensions based on maximal extension level.

*Example 3.* For Fig. 1, the trace [c = 0,c < 66] has a maximum extension level of 0. Since (c < 66) is 2-inductive, the trace is strongly extendable at level 1 (as was seen in Example 2).

**kAvy Algorithm.** kAvy is shown in Fig. 3. It starts with an inductive trace *F* = [Init] and iteratively extends *F* using SELs. A counterexample is returned if the trace cannot be extended (line 4). Otherwise, kAvy computes the largest extension level (line 5) (described in Sect. 4.2). Then, it constructs a strong extension trace using kAvyExtend (line 6) (described in Sect. 4.1). Finally, PdrPush is called to check whether the trace is closed. Note that *<sup>F</sup>* is a monotone, clausal, safe inductive trace throughout the algorithm.

#### **4.1 Extending a Trace with Strong Induction**

In this section, we describe the procedure kAvyExtend (shown in Algorithm 4) that given a trace *<sup>F</sup>* of size <sup>|</sup>*F*<sup>|</sup> <sup>=</sup> <sup>N</sup> and an (i, k) SEL of *<sup>F</sup>* constructs an (i, k) extension trace *<sup>G</sup>* of size <sup>|</sup>*G*<sup>|</sup> <sup>=</sup> <sup>N</sup> + 1. The procedure itself is fairly simple, but its proof of correctness is complex. We first present the theoretical results that connect sequence interpolants with strong extension traces, then the procedure, and then details of its correctness. Through the section, we fix a trace *F* and its SEL (i, k).

*Sequence Interpolation for SEL.* Let (i, k) be an SEL of *F*. By Proposition 2, Ψ = *Tr* -*F*i <sup>k</sup> <sup>∧</sup> *Bad*(¯vN+1) is unsatisfiable. Let <sup>A</sup> <sup>=</sup> {A<sup>i</sup>−k+1,...,AN+1} be a partitioning of Ψ defined as follows:

$$A\_{j} = \begin{cases} F\_{i}(\bar{v}\_{j}) \wedge \operatorname{Tr}(\bar{v}\_{j}, \bar{v}\_{j+1}) & \text{if } i - k + 1 \le j \le i \\ F\_{j}(\bar{v}\_{j}) \wedge \operatorname{Tr}(\bar{v}\_{j}, \bar{v}\_{j+1}) & \text{if } i < j \le N \\ \operatorname{Bad}(\bar{v}\_{N+1}) & \text{if } j = N + 1 \end{cases}$$

Since (∧A) = <sup>Ψ</sup>, <sup>A</sup> is unsatisfiable. Let *<sup>I</sup>* = [I<sup>i</sup>−k+2,...,IN+1] be a sequence interpolant corresponding to A. Then, *I* satisfies the following properties:

$$\begin{aligned} F\_i \wedge Tr &\Rightarrow I'\_{i-k+2} &\forall i-k+2 \leq j \leq i \cdot (F\_i \wedge I\_j) \wedge Tr &\Rightarrow I'\_{j+1} \qquad &(\heartsuit) \\\ I\_{N+1} &\Rightarrow \negBad &\forall i < j \leq N \cdot (F\_j \wedge I\_j) \wedge Tr &\Rightarrow I'\_{j+1} \end{aligned}$$

Note that in (♥), both <sup>i</sup> and <sup>k</sup> are fixed—they are the (i, k)-extension level. Furthermore, in the top row F<sup>i</sup> is fixed as well.

The conjunction of the first k interpolants in *I* is k-inductive relative to the frame Fi:

$$\textbf{Lemma 2.}\text{ }The\text{ }formula\text{ }F\_{i+1}\land\left(\bigwedge\_{m=i-k+2}^{i+1} I\_m\right)\text{ is }k\text{-inductive\text{ }relative\text{ }to\text{ }F\_i\text{.} $$

*Proof.* Since <sup>F</sup><sup>i</sup> and <sup>F</sup>i+1 are consecutive frames of a trace, <sup>F</sup>i∧*Tr* <sup>⇒</sup> <sup>F</sup> <sup>i</sup>+1. Thus, <sup>∀</sup>i−k+2 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>i</sup>·*Tr* [Fi] j <sup>i</sup>−k+2 <sup>⇒</sup> <sup>F</sup>i+1(¯vj+1). Moreover, by (♥), <sup>F</sup>i∧*Tr* <sup>⇒</sup> <sup>I</sup> i−k+2 and <sup>∀</sup><sup>i</sup> <sup>−</sup> <sup>k</sup> + 2 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>i</sup> + 1 · (F<sup>i</sup> <sup>∧</sup> <sup>I</sup><sup>j</sup> ) <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>I</sup> <sup>j</sup>+1. Equivalently, <sup>∀</sup><sup>i</sup> <sup>−</sup> <sup>k</sup> + 2 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>i</sup> + 1 · *Tr* [Fi] j <sup>i</sup>−k+2 <sup>⇒</sup> <sup>I</sup>j+1(¯vj+1). By induction over the difference between (i+ 1) and (i−k+ 2), we show that *Tr* [Fi] i+1 <sup>i</sup>−k+2 <sup>⇒</sup> (Fi+1 <sup>∧</sup><sup>i</sup>+1 <sup>m</sup>=i−k+2 <sup>I</sup>m)(¯vi+1), which concludes the proof.

We use Lemma 2 to define a strong extension trace *G*:

**Lemma 3.** *Let G* = [G0,...,GN+1]*, be an inductive trace defined as follows:*

$$G\_j = \begin{cases} F\_j & \text{if } 0 \le j < i - k + 2 \\ F\_j \wedge \begin{pmatrix} j & \\ \bigwedge\_{m=i-k+2} I\_m \end{pmatrix} & \text{if } i - k + 2 \le j < i + 2 \\ (F\_j \wedge I\_j) & \text{if } i + 2 \le j < N + 1 \\ I\_{N+1} & \text{if } j = (N+1) \end{cases}$$

#### *Then, G is an* (i, k)*-extension trace of F (not necessarily monotone).*

*Proof.* By Lemma 2, Gi+1 is k-inductive relative to Fi. Therefore, it is sufficient to show that *G* is a safe inductive trace that is stronger than *F*. By definition, <sup>∀</sup><sup>0</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>N</sup> · <sup>G</sup><sup>j</sup> <sup>⇒</sup> <sup>F</sup><sup>j</sup> . By (♥), <sup>F</sup><sup>i</sup> <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>I</sup> <sup>i</sup>−k+2 and <sup>∀</sup><sup>i</sup> <sup>−</sup> <sup>k</sup> + 2 <sup>≤</sup> j < <sup>i</sup>+ 2 ·(F<sup>i</sup> <sup>∧</sup>I<sup>j</sup> )∧*Tr* <sup>⇒</sup> <sup>I</sup> <sup>j</sup>+1. By induction over <sup>j</sup>, (F<sup>i</sup> <sup>∧</sup> <sup>j</sup> <sup>m</sup>=i−k+2 <sup>I</sup>m) <sup>∧</sup> *Tr* ⇒ j+1 <sup>m</sup>=i−k+2 <sup>I</sup> <sup>m</sup> for all <sup>i</sup> <sup>−</sup> <sup>k</sup> + 2 <sup>≤</sup> j<i + 2. Since *<sup>F</sup>* is monotone, <sup>∀</sup><sup>i</sup> <sup>−</sup> <sup>k</sup> + 2 <sup>≤</sup> j<i + 2 · (F<sup>j</sup> <sup>∧</sup> <sup>j</sup> <sup>m</sup>=i−k+2 <sup>I</sup>m) <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>j</sup>+1 <sup>m</sup>=i−k+2 <sup>I</sup> m.

By (♥), <sup>∀</sup>i<j <sup>≤</sup> <sup>N</sup> · (F<sup>j</sup> <sup>∧</sup> <sup>I</sup><sup>j</sup> ) <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>I</sup> <sup>j</sup>+1. Again, since *F* is a trace, we conclude that <sup>∀</sup>i<j<N ·(F<sup>j</sup> <sup>∧</sup>I<sup>j</sup> )∧*Tr* <sup>⇒</sup> (Fj+1∧Ij+1) . Combining the above, <sup>G</sup><sup>j</sup> <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>G</sup> <sup>j</sup>+1 for 0 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>N</sup>. Since *<sup>F</sup>* is safe and <sup>I</sup>N+1 ⇒ ¬*Bad*, then *<sup>G</sup>* is safe and stronger than *F*.

Lemma 3 defines an obvious procedure to construct an (i, k)-extension trace *G* for *F*. However, such *G* is neither monotone nor clausal. In the rest of this section, we describe the procedure kAvyExtend that starts with a sequence interpolant (as in Lemma 3), but uses PdrBlock to systematically construct a safe monotone clausal extension of *F*.

The procedure kAvyExtend is shown in Algorithm 4. For simplicity of the presentation, we assume that PdrBlock does not use inductive generalization. The invariants marked by † rely on this assumption. We stress that the assumption is for presentation only. The correctness of kAvyExtend is independent of it.

kAvyExtend starts with a sequence interpolant according to the partitioning <sup>A</sup>. The extension trace *<sup>G</sup>* is initialized to *<sup>F</sup>* and <sup>G</sup>N+1 is initialized to (line 2). The rest proceeds in three phases: *Phase 1* (lines 3–5) computes the prefix <sup>G</sup><sup>i</sup>−k+2,...,Gi+1 using the first <sup>k</sup> <sup>−</sup> 1 elements of *<sup>I</sup>*; *Phase 2* (line 8) computes Gi+1 using Ii+1; *Phase 3* (lines 9–12) computes the suffix *G*<sup>i</sup>+2 using the last (<sup>N</sup> <sup>−</sup> <sup>i</sup>) elements of *<sup>I</sup>*. During this phase, PdrPush (line 12) pushes clauses forward so that they can be used in the next iteration. The correctness of the phases follows from the invariants shown in Algorithm 4. We present each phase in turn.

Recall that PdrBlock takes a trace *<sup>F</sup>* (that is safe up to the last frame) and a transition system, and returns a safe strengthening of *F*, while ensuring that the result is monotone and clausal. This guarantee is maintained by Algorithm 4, by requiring that any clause added to any frame G<sup>i</sup> of *G* is implicitly added to all frames below Gi.

*Phase 1.* By Lemma 2, the first k elements of the sequence interpolant computed at line 1 over-approximate states reachable in i + 1 steps of *Tr* . Phase 1 uses this to strengthen Gi+1 using the first k elements of *I*. Note that in that phase, new clauses are always added to frame Gi+1, and all frames before it!

**Algorithm 4.** kAvyExtend. The invariants marked † hold only when the PdrBlock does no inductive generalization.

**Input:** a monotone, clausal, safe trace *F* of size N **Input:** A strong extension level (i, k) s.t. *Tr*-*F* i <sup>k</sup> <sup>∧</sup> *Bad*(¯vN+1) is unsatisfiable **Output:** a monotone, clausal, safe trace *G* of size N + 1 **<sup>1</sup>** <sup>I</sup><sup>i</sup>−k+2,...,IN+1 <sup>←</sup> seqItp(*Tr*-*F* i <sup>k</sup> <sup>∧</sup> *Bad*(¯vN+1)) **<sup>2</sup>** *<sup>G</sup>* <sup>←</sup> [F0,...,F<sup>N</sup> , ] **<sup>3</sup> for** <sup>j</sup> <sup>←</sup> <sup>i</sup> <sup>−</sup> <sup>k</sup> + 1 **to** <sup>i</sup> **do <sup>4</sup>** <sup>P</sup><sup>j</sup> <sup>←</sup> (G<sup>j</sup> <sup>∨</sup> (Gi+1 <sup>∧</sup> <sup>I</sup>j+1)) // Inv1: *G* is monotone and clausal // Inv2: <sup>G</sup><sup>i</sup> <sup>∧</sup> T r <sup>⇒</sup> <sup>P</sup><sup>j</sup> // Inv† <sup>3</sup> : <sup>∀</sup>j<m <sup>≤</sup> (<sup>i</sup> + 1) · <sup>G</sup><sup>m</sup> <sup>≡</sup> <sup>F</sup><sup>m</sup> <sup>∧</sup> <sup>j</sup>−<sup>1</sup> -<sup>=</sup>i−k+1 (G- <sup>∨</sup> <sup>I</sup>-+1) // Inv<sup>3</sup> : <sup>∀</sup>j<m <sup>≤</sup> (<sup>i</sup> + 1) · <sup>G</sup><sup>m</sup> <sup>⇒</sup> <sup>F</sup><sup>m</sup> <sup>∧</sup> <sup>j</sup>−<sup>1</sup> -<sup>=</sup>i−k+1 (G- <sup>∨</sup> <sup>I</sup>-+1) **<sup>5</sup>** [ , , Gi+1] <sup>←</sup> PdrBlock([Init, Gi, Gi+1], (*Init*, *Tr*, <sup>¬</sup>P<sup>j</sup> )) **<sup>6</sup>** <sup>P</sup><sup>i</sup> <sup>←</sup> (G<sup>i</sup> <sup>∨</sup> (Gi+1 <sup>∧</sup> <sup>I</sup>j+1)) **<sup>7</sup> if** <sup>i</sup> = 0 **then** [ , , Gi+1] <sup>←</sup> PdrBlock([*Init*, Gi+1], (*Init*, *Tr*, <sup>¬</sup>Pi)) **<sup>8</sup> else** [ , , Gi+1] <sup>←</sup> PdrBlock([*Init*, Gi, Gi+1], (*Init*, *Tr*, <sup>¬</sup>Pi)) // Inv† <sup>4</sup>: <sup>G</sup>i+1 <sup>≡</sup> <sup>F</sup>i+1 <sup>∧</sup> <sup>i</sup> -<sup>=</sup>i−k+1 (G- <sup>∨</sup> <sup>I</sup>-+1) // Inv4: <sup>G</sup>i+1 <sup>⇒</sup> <sup>F</sup>i+1 <sup>∧</sup> <sup>i</sup> -<sup>=</sup>i−k+1 (G- <sup>∨</sup> <sup>I</sup>-+1) **<sup>9</sup> for** <sup>j</sup> <sup>←</sup> <sup>i</sup> + 1 **to** <sup>N</sup> + 1 **do <sup>10</sup>** <sup>P</sup><sup>j</sup> <sup>←</sup> <sup>G</sup><sup>j</sup> <sup>∨</sup> (Gj+1 <sup>∧</sup> <sup>I</sup>j+1) // Inv6: <sup>G</sup><sup>j</sup> <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>P</sup><sup>j</sup> **<sup>11</sup>** [ , , Gj+1] <sup>←</sup> PdrBlock([*Init*, G<sup>j</sup> , Gj+1], (*Init*, *Tr*, <sup>¬</sup>P<sup>j</sup> )) **<sup>12</sup>** *<sup>G</sup>* <sup>←</sup> PdrPush(*G*) // Inv† <sup>7</sup>: *G* is an (i, k)-extension trace of *F* // Inv7: *G* is an extension trace of *F* **13 return** *G*

Correctness of Phase 1 (line 5) follows from the loop invariant Inv2. It holds on loop entry since <sup>G</sup><sup>i</sup> <sup>∧</sup> *Tr* <sup>⇒</sup> <sup>I</sup><sup>i</sup>−k+2 (since <sup>G</sup><sup>i</sup> <sup>=</sup> <sup>F</sup><sup>i</sup> and (♥)) and <sup>G</sup><sup>i</sup> <sup>∧</sup> *Tr* <sup>⇒</sup> Gi+1 (since *G* is initially a trace). Let G<sup>i</sup> and G<sup>∗</sup> <sup>i</sup> be the <sup>i</sup> th frame before and after execution of iteration <sup>j</sup> of the loop, respectively. PdrBlock blocks <sup>¬</sup>P<sup>j</sup> at iteration j of the loop. Assume that Inv<sup>2</sup> holds at the beginning of the loop. Then, G<sup>∗</sup> <sup>i</sup> <sup>⇒</sup> <sup>G</sup><sup>i</sup> <sup>∧</sup> <sup>P</sup><sup>j</sup> since PdrBlock strengthens <sup>G</sup>i. Since <sup>G</sup><sup>j</sup> <sup>⇒</sup> <sup>G</sup><sup>i</sup> and <sup>G</sup><sup>i</sup> <sup>⇒</sup> <sup>G</sup>i+1, this simplifies to <sup>G</sup><sup>∗</sup> <sup>i</sup> <sup>⇒</sup> <sup>G</sup><sup>j</sup> <sup>∨</sup> (G<sup>i</sup> <sup>∧</sup> <sup>I</sup>j+1). Finally, since *<sup>G</sup>* is a trace, Inv<sup>2</sup> holds at the end of the iteration.

Inv<sup>2</sup> ensures that the trace given to PdrBlock at line 5 *can* be made safe relative to <sup>P</sup><sup>j</sup> . From the post-condition of PdrBlock, it follows that at iteration j, Gi+1 is strengthened to G<sup>∗</sup> <sup>i</sup>+1 such that <sup>G</sup><sup>∗</sup> <sup>i</sup>+1 <sup>⇒</sup> <sup>P</sup><sup>j</sup> and *<sup>G</sup>* remains a monotone clausal trace. At the end of *Phase 1*, [G0,...,Gi+1] is a clausal monotone trace.

Interestingly, the calls to PdrBlock in this phase do not satisfy an expected pre-condition: the frame G<sup>i</sup> in [*Init*, Gi, Gi+1] might not be safe for property P<sup>j</sup> . However, we can see that *Init* <sup>⇒</sup> <sup>P</sup><sup>j</sup> and from Inv2, it is clear that <sup>P</sup><sup>j</sup> is inductive relative to <sup>G</sup>i. This is a sufficient precondition for PdrBlock.

*Phase 2.* This phase strengthens Gi+1 using the interpolant Ii+1. After Phase 2, Gi+1 is k-inductive relative to Fi.

*Phase 3.* Unlike *Phase 1*, Gj+1 is computed at the jth iteration. Because of this, the property P<sup>j</sup> in this phase is slightly different than that of Phase 1. Correctness follows from invariant Inv<sup>6</sup> that ensures that at iteration j, Gj+1 *can* be made safe relative to <sup>P</sup><sup>j</sup> . From the post-condition of PdrBlock, it follows that Gj+1 is strengthened to G<sup>∗</sup> <sup>j</sup>+1 such that <sup>G</sup><sup>∗</sup> <sup>j</sup>+1 <sup>⇒</sup> <sup>P</sup><sup>j</sup> and *<sup>G</sup>* is a monotone clausal trace. The invariant implies that at the end of the loop <sup>G</sup>N+1 <sup>⇒</sup> <sup>G</sup><sup>N</sup> <sup>∨</sup> <sup>I</sup>N+1, making *<sup>G</sup>* safe. Thus, at the end of the loop *<sup>G</sup>* is a safe monotone clausal trace that is stronger than *F*. What remains is to show is that Gi+1 is k-inductive relative to Fi.

Let <sup>ϕ</sup> be the formula from Lemma 2. Assuming that PdrBlock did no inductive generalization, *Phase 1* maintains Inv† <sup>3</sup>, which states that at iteration <sup>j</sup>, PdrBlock strengthens frames {Gm}, j<m <sup>≤</sup> (<sup>i</sup> + 1). Inv† <sup>3</sup> holds on loop entry, since initially *G* = *F*. Let Gm, G<sup>∗</sup> <sup>m</sup> ( j<m <sup>≤</sup> (<sup>i</sup> + 1) ) be frame <sup>m</sup> at the beginning and at the end of the loop iteration, respectively. In the loop, PdrBlock adds clauses that block <sup>¬</sup>P<sup>j</sup> . Thus, <sup>G</sup><sup>∗</sup> <sup>m</sup> <sup>≡</sup> <sup>G</sup><sup>m</sup> <sup>∧</sup>P<sup>j</sup> . Since <sup>G</sup><sup>j</sup> <sup>⇒</sup> <sup>G</sup>m, this simplifies to G<sup>∗</sup> <sup>m</sup> <sup>≡</sup> <sup>G</sup><sup>m</sup> <sup>∧</sup> (G<sup>j</sup> <sup>∨</sup> <sup>I</sup>j+1). Expanding <sup>G</sup>m, we get <sup>G</sup><sup>∗</sup> <sup>m</sup> <sup>≡</sup> <sup>F</sup><sup>m</sup> <sup>∧</sup> j -<sup>=</sup>i−k+1 (G- <sup>∨</sup> <sup>I</sup>-+1). Thus, Inv† <sup>3</sup> holds at the end of the loop.

In particular, after line 8, <sup>G</sup>i+1 <sup>≡</sup> <sup>F</sup>i+1 <sup>∧</sup> <sup>i</sup> -<sup>=</sup>i−k+1 (G- <sup>∨</sup> <sup>I</sup>-+1). Since <sup>ϕ</sup> <sup>⇒</sup> Gi+1, Gi+1 is k-inductive relative to Fi.

**Theorem 2.** *Given a safe trace <sup>F</sup> of size* <sup>N</sup> *and an SEL* (i, k) *for <sup>F</sup>,* kAvyExtend *returns a clausal monotone extension trace <sup>G</sup> of size* <sup>N</sup>+1*. Furthermore, if* PdrBlock *does no inductive generalization then <sup>G</sup> is an* (i, k)*-extension trace.*

Of course, assuming that PdrBlock does no inductive generalization is not realistic. kAvyExtend remains correct without the assumption: it returns a trace *G* that is a monotone clausal extension of *F*. However, *G* might be stronger than any (i, k)-extension of *F*. The invariants marked with † are then relaxed to their unmarked versions. Overall, inductive generalization improves kAvyExtend since it is not restricted to only a <sup>k</sup>-inductive strengthening.

Importantly, the output of kAvyExtend is a regular inductive trace. Thus, kAvyExtend is a procedure to strengthen a (relatively) <sup>k</sup>-inductive certificate to a (relatively) 1-inductive certificate. Hence, after kAvyExtend, any strategy for further generalization or trace extension from IC3, Pdr, or Avy is applicable.

#### **4.2 Searching for the Maximal SEL**

In this section, we describe two algorithms for computing the maximal SEL. Both algorithms can be used to implement line 5 of Algorithm 3. They perform a guided search for group minimal unsatisfiable subsets. They terminate when having fewer clauses would not increase the SEL further. The first, called *topdown*, starts from the largest unrolling of the *Tr* and then reduces the length of the unrolling. The second, called *bottom-up*, finds the largest (regular) extension level first, and then grows it using strong induction.


*Top-Down SEL.* A pair (i, k) is the maximal SEL iff

$$\begin{aligned} i &= \max \left\{ j \mid 0 \le j \le N \cdot Tr \| \mathbf{F}^j \| ^{j+1} \wedge \operatorname{Bad}(\bar{v}\_{N+1}) \Rightarrow \bot \right\} \\ k &= \min \left\{ \ell \mid 1 \le \ell \le (i+1) \cdot Tr \| \mathbf{F}^i \| ^{\ell} \wedge \operatorname{Bad}(\bar{v}\_{N+1}) \Rightarrow \bot \right\} \end{aligned}$$

Note that <sup>k</sup> depends on <sup>i</sup>. For a SEL (i, k) ∈ K(*F*), we refer to the formula *Tr* [*F*<sup>i</sup> ] as a *suffix* and to number k as the depth of induction. Thus, the search can be split into two phases: (a) find the smallest suffix while using the maximal depth of induction allowed (for that suffix), and (b) minimizing the depth of induction k for the value of i found in step (a). This is captured in Algorithm 5. The algorithm requires at most (<sup>N</sup> + 1) sat queries. One downside, however, is that the formulas constructed in the first phase (line 3) are large because the depth of induction is the maximum possible.

*Bottom-Up SEL.* Algorithm 6 searches for a SEL by first finding a maximal regular extension level (line 2) and then searching for larger SELs (lines 6 to 10). Observe that if (j, ) ∈ K(*F*), then <sup>∀</sup>p>j · (p, ) ∈ K(*F*). This is used at line 7 to increase the depth of induction once it is known that (j, ) ∈ K(*F*). On the other hand, if (j, ) ∈ K(*F*), there might be a larger SEL (<sup>j</sup> + 1, ). Thus, whenever a SEL (j, ) is found, it is stored in (i, k) and the search continues (line 10). The algorithm terminates when there are no more valid SEL candidates and returns the last valid SEL. Note that is incremented only when there does not exists a larger SEL with the current value of . Thus, for each valid level j, if there exists SELs with level j, the algorithm is guaranteed to find the largest such SEL. Moreover, the level is increased at every possible opportunity. Hence, at the end (i, k) = max <sup>K</sup>(*F*).

**Fig. 2.** Runtime comparison on SAFE HWMCC instances (a) and *shift* instances (b).

In the worst case, Algorithm <sup>6</sup> makes at most 3<sup>N</sup> sat queries. However, compared to Algorithm 5, the queries are smaller. Moreover, the computation is incremental and can be aborted with a sub-optimal solution after execution of line 5 or line 9. Note that at line 5, <sup>i</sup> is a regular extension level (i.e., as in Avy), and every execution of line 9 results in a larger SEL.

#### **5 Evaluation**

We implemented kAvy on top of the Avy Model Checker<sup>1</sup>. For line 5 of Algorithm <sup>3</sup> we used Algorithm 5. We evaluated kAvy's performance against a version of Avy [29] from the Hardware Model Checking Competition 2017 [5], and the Pdr engine of ABC [13]. We have used the benchmarks from HWMCC'14, '15, and '17. Benchmarks that are not solved by any of the solvers are excluded from the presentation. The experiments were conducted on a cluster running Intel E5- 2683 V4 CPUs at 2.1 GHz with 8 GB RAM limit and 30 min time limit.

The results are summarized in Table 1. The HWMCC has a wide variety of benchmarks. We aggregate the results based on the competition, and also benchmark origin (based on the name). Some named categories (e.g., *intel*) include benchmarks that have not been included in any competition. The first column in Table 1 indicates the category. **Total** is the number of all available benchmarks, ignoring duplicates. That is, if a benchmark appeared in multiple categories, it is counted only once. Numbers in brackets indicate the number of instances that are solved uniquely by the solver. For example, kAvy solves 14 instances in *oc8051* that are not solved by any other solver. The VBS column indicates the *Virtual Best Solver*—the result of running all the three solvers in parallel and stopping as soon as one solver terminates successfully.

Overall, kAvy solves more safe instances than both Avy and Pdr, while taking less time than Avy (we report time for solved instances, ignoring timeouts). The VBS column shows that kAvy is a promising new strategy, significantly improving overall performance. In the rest of this section, we analyze the

<sup>1</sup> All code, benchmarks, and results are available at https://arieg.bitbucket.io/avy/.


**Table 1.** Summary of instances solved by each tool. Timeouts were ignored when computing the time column.

results in more detail, provide detailed run-time comparison between the tools, and isolate the effect of the new k-inductive strategy.

To compare the running time, we present scatter plots comparing kAvy and Avy (Fig. 3a), and kAvy and Pdr (Fig. 3b). In both figures, kAvy is at the bottom. Points above the diagonal are better for kAvy. Compared to Avy, whenever an instance is solved by both solvers, kAvy is often faster, sometimes by orders of magnitude. Compared to Pdr, kAvy and Pdr perform well on very different instances. This is similar to the observation made by the authors of the original paper that presented Avy [29]. Another indicator of performance is the depth of convergence. This is summarized in Fig. 3d and e. kAvy often converges much sooner than Avy. The comparison with Pdr is less clear which is consistent with the difference in performance between the two. To get the whole picture, Fig. 2a presents a cactus plot that compares the running times of the algorithms on all these benchmarks.

To isolate the effects of <sup>k</sup>-induction, we compare kAvy to a version of kAvy with <sup>k</sup>-induction disabled, which we call vanilla. Conceptually, vanilla is similar to Avy since it extends the trace using a 1-inductive extension trace, but its implementation is based on kAvy. The results for the running time and the depth of convergence are shown in Fig. 3c and f, respectively. The results are very clear—using strong extension traces significantly improves performance and has non-negligible affect on depth of convergence.

Finally, we discovered one family of benchmarks, called shift, on which kAvy performs orders of magnitude better than all other techniques. The benchmarks come from encoding bit-vector decision problem into circuits [21,30]. The shift family corresponds to deciding satisfiability of (x + y)=(x << 1) for two

**Fig. 3.** Comparing running time ((a), (b), (c)) and depth of convergence ((d), (e), (f)) of Avy, Pdr and vanilla with kAvy. kAvy is shown on the x-axis. Points above the diagonal are better for kAvy. Only those instances that have been solved by both solvers are shown in each plot.

bit-vecors x and y. The family is parameterized by bit-width. The property is <sup>k</sup>-inductive, where <sup>k</sup> is the bit-width of <sup>x</sup>. The results of running Avy, Pdr, <sup>k</sup>-induction<sup>2</sup>, and kAvy are shown in Fig. 2b. Except for kAvy, all techniques exhibit exponential behavior in the bit-width, while kAvy remains constant. Deeper analysis indicates that kAvy finds a small inductive invariant while exploring just two steps in the execution of the circuit. At the same time, neither inductive generalization nor k-induction alone are able to consistently find the same invariant quickly.

#### **6 Conclusion**

In this paper, we present kAvy—an SMC algorithm that effectively uses <sup>k</sup>inductive reasoning to guide interpolation and inductive generalization. kAvy searches both for a good inductive strengthening and for the most effective induction depth <sup>k</sup>. We have implemented kAvy on top of Avy Model Checker. The experimental results on HWMCC instances show that our approach is effective.

The search for the maximal SEL is an overhead in kAvy. There could be benchmarks in which this overhead outweighs its benefits. However, we have not come across such benchmarks so far. In such cases, kAvy can choose to settle for a sub-optimal SEL as mentioned in Sect. 4.2. Deciding when and how much to settle for remains a challenge.

<sup>2</sup> We used the <sup>k</sup>-induction engine ind in Abc [8].

**Acknowledgements.** We thank the anonymous reviewers and Oded Padon for their thorough review and insightful comments. This research was enabled in part by support provided by Compute Ontario (https://computeontario.ca/), Compute Canada (https://www.computecanada.ca/) and the grants from Natural Sciences and Engineering Research Council Canada.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Verifying Asynchronous Event-Driven Programs Using Partial Abstract Transformers**

Peizun Liu1(B), Thomas Wahl<sup>1</sup>, and Akash Lal<sup>2</sup>

<sup>1</sup> Northeastern University, Boston, USA lpzun@ccs.neu.edu <sup>2</sup> Microsoft Research, Bangalore, India

**Abstract.** We address the problem of analyzing asynchronous eventdriven programs, in which concurrent agents communicate via unbounded message queues. The safety verification problem for such programs is undecidable. We present in this paper a technique that combines *queue-bounded exploration* with a *convergence test*: if the sequence of certain abstractions of the reachable states, for increasing queue bounds k, converges, we can prove any property of the program that is preserved by the abstraction. If the abstract state space is finite, convergence is *guaranteed*; the challenge is to catch the point kmax where it happens. We further demonstrate how simple invariants formulated over the *concrete* domain can be used to eliminate spurious *abstract* states, which otherwise prevent the sequence from converging. We have implemented our technique for the P programming language for event-driven programs. We show experimentally that the sequence of abstractions often converges fully automatically, in hard cases with minimal designer support in the form of sequentially provable invariants, and that this happens for a value of kmax small enough to allow the method to succeed in practice.

#### **1 Introduction**

*Asynchronous event-driven (AED) programming* refers to a style of programming multi-agent applications. The agents communicate shared work via messages. Each agent waits for a message to arrive, and then processes it, possibly sending messages to other agents, in order to collectively achieve a goal. This programming style is common for distributed systems as well as low-level designs such as device drivers [11]. Getting such applications right is an arduous task, due to the inherent concurrency: the programmer must defend against all possible interleavings of messages between agents. In response to this challenge, recent years have seen multiple approaches to verifying AED-like programs, e.g. by delaying send actions, or temporarily bounding their number (to keep queue sizes small) [7,10],

Work supported by the US National Science Foundation under Grant No. 1253331, and by Microsoft Research India while hosting the second author for a sabbatical.

or by reasoning about a small number of representative execution schedules, to avoid interleaving explosion [5].

In this paper we consider the P language for AED programming [11]. A P program consists of multiple state machines running in parallel. Each machine has a local store, and a message queue through which it receives events from other machines. P allows the programmer to formulate safety specifications via a statement that asserts some predicate over the local state of a single machine. Verifying such reachability properties of course requires reasoning over global system behavior and is, for unbounded-queue P programs, undecidable [8].

The unboundedness of the reachable state space does not prevent the use of testing tools that try to explore as much of the state space as possible [3,6,11,13] in the quest for bugs. Somewhat inspired by this kind of approach, the goal of this paper is a verification technique that can (sometimes) *prove* a safety property, despite exploring only a finite fraction of that space. Our approach is as follows. Assuming that the machines' queues are the only source of unboundedness, we consider a bound k on the queue size, and exhaustively compute the reachable states R<sup>k</sup> of the resulting finite-state problem, checking the local assertion Φ along the way. We then increase the queue bound until (an error is found, or) we reach some point kmax of *convergence*: a point that allows us to conclude that increasing k further is not required to prove Φ.

What kind of "convergence" are we targeting? We design a sequence (Rk)<sup>∞</sup> k=0 of abstractions of each reachability set over a *finite* abstract state space. Due to the monotonicity of sequence (Rk)<sup>∞</sup> <sup>k</sup>=0, this ensures convergence, i.e. the existence of kmax such that R<sup>K</sup> = R<sup>k</sup>max for all K ≥ kmax. Provided that an abstract state satisfies Φ exactly if all its concretizations do, we have: if all abstract states in R<sup>k</sup>max comply with Φ, then so do all reachable concrete states of P—we have proved the property.

We implement this strategy using an abstraction function α with a finite co-domain that leaves the local state of a machine unchanged and maintains the *first occurrence* of each event in the queue; repeat occurrences are dropped. This abstraction preserves properties over the local state and the head of the queue, i.e. the visible (to the machine) part of the state space, which is typically sufficient to express reachability properties.

The second major step in our approach is the detection of the point of convergence of (Rk)<sup>∞</sup> <sup>k</sup>=0: We show that, for the *best abstract transformer Im* [9,27, see Sect. 4.2], if *Im*(Rk) ⊆ Rk, then R<sup>K</sup> = R<sup>k</sup> for all K ≥ k. In fact, we have a stronger result: under an easy-to-enforce condition, it suffices to consider abstract *dequeue operations*: all others, namely enqueue and local actions, never lead to abstract states in Rk+1 \ Rk. The best abstract transformer for dequeue actions is efficiently implementable for a given P program.

It is of course possible that the convergence condition *Im*(Rk) ⊆ R<sup>k</sup> never holds (the problem is undecidable). This manifests in the presence of a *spurious* abstract state in the image produced by *Im*, i.e. one whose concretization does not contain any reachable state. Our third contribution is a technique to assist users in eliminating such states, enhancing the chances for convergence. We have observed that spurious abstract states are often due to violations of simple *machine invariants*: invariants that do not depend on the behavior of other machines. By their nature, they can be proved using a cheap sequential analysis.

We can eliminate an abstract state (e.g. produced by *Im*) if *all* its concretizations violate a machine invariant. In this paper, we propose a domain-specific temporal logic to express invariants over machines with event queues and, more importantly, an algorithm that decides the above *abstract queue invariant checking* problem, by reducing it efficiently to a plain model checking problem. We have used this technique to ensure the convergence in "hard" cases that otherwise defy convergence of the abstract reachable states sequence.

We have implemented our technique for the P language and empirically evaluated it on an extensive set of benchmark programs. The experimental results support the following conclusions: (i) for our benchmark programs, the sequence of abstractions often converges fully automatically, in hard cases with minimal designer support in the form of separately dischargeable invariants; (ii) almost all examples converge at a small value of kmax; and (iii) the overhead our technique adds to the bounding technique is small: the bulk is spent on the exhaustive bounded exploration itself.

Proofs and other supporting material can be found in the Appendix of [23].

#### **2 Overview**

We illustrate the main ideas of this paper using an example in the P language. A machine in a P program consists of multiple states. Each state defines an entry code block that is executed when the machine enters the state. The state also defines handlers for each event type e that it is prepared to receive. A handler can either be on e do foo (executing foo on receiving e), or ignore e (dequeuing and dropping e). A state can also have a defer e declaration; the semantics is that a machine dequeues the first non-deferred event in its queue. As a result, a queue in a P program is not strictly FIFO. This relaxation is an important feature of P that helps programmers express their logic compactly [11]. Figure 1 shows a P program named *PiFl*, in which a Sender (eventually) floods a Receiver's queue with Ping events. This queue is the only source of unboundedness in *PiFl*.

A critical property for P programs is *(bounded) responsiveness*: the receiving machine must have a handler (e.g. on, defer, ignore) for every event arriving at the queue head; otherwise the event will come as a "surprise" and crash the machine. To prove responsiveness for *PiFl*, we have to demonstrate (among others) that in state Ignore it, the Done event is never at the head of the Receiver's queue. We cannot perform exhaustive model checking, since the set of reachable states is infinite. Instead, we will compute a conservative abstraction of this set that is precise enough to rule out Done events at the queue head in this state.

We first define a suitable abstraction function α that collapses repeated occurrences of events to each event's first occurrence. For instance, the queue

$$\mathcal{Q} = \text{PRIME}.\text{PRIME}.\text{PRIME}.\text{DONE.PING.PING.PING.PNG}\tag{1}$$

**Fig. 1.** *PiFl*: a Ping-Flood scenario. The Sender and the Receiver communicate via events of types Prime, Done, and Ping. After sending some Prime events and one Done, the Sender floods the Receiver with Pings. The Receiver initially defers Primes. Upon receiving Done it enters a state in which it ignores Ping.

will be abstracted to <sup>Q</sup> <sup>=</sup> <sup>α</sup>(Q) = Prime.Done.Ping. The *finite* number of possible abstract queues is 1 + 3 + 3 · 2+3 · 2 · 1 = 16. The abstraction preserves the head of the queue. This and the machine state has enough information to check responsiveness.

We now generate the sequence R<sup>k</sup> of abstractions of the reachable states sets R<sup>k</sup> for queue size bounds k = 0, 1, 2,..., by computing each finite set Rk, and then R<sup>k</sup> as α(Rk). The obtained monotone sequence (Rk)<sup>∞</sup> <sup>k</sup>=0 over a finite domain will eventually converge, but we must prove that it has. This is done by applying the *best abstract transformer Im*, restricted to dequeue operations (defined in Sect. 4.2), to the current set Rk, and confirming that the result is contained in Rk.

As it turns out, the confirmation fails for the *PiFl* program: k = 5 marks the first time set R<sup>k</sup> repeats, i.e. R<sup>4</sup> = R5, so we are motivated to run the convergence test. Unfortunately we find a state ¯s ∈ *Im*(R5) \R5, preventing convergence. Our approach now offers two remedies to this dilemma. One is to refine the queue abstraction. In our implementation, function α is really αp, for a parameter p that denotes the size of the *prefix* of the queue that is kept unchanged by the abstraction. For example, for the queue from Eq. (1) we have <sup>α</sup>4(Q) = Prime.Prime.Prime.Done <sup>|</sup> Ping, where <sup>|</sup> separates the prefix from the "infinite tail" of the abstract queue. This (straightforward) refinement maintains finiteness of the abstraction and increases precision, by revealing that the queue starts with three Prime events. Re-running the analysis for the *PiFl* program with p = 4, at k = 5 we find *Im*(R5) ⊆ R5, and the proof is complete.

The second remedy to the failed convergence test dilemma is more powerful but also less automatic. Let's revert to prefix p = 0 and inspect the abstract state ¯<sup>s</sup> <sup>∈</sup> *Im*(R5) \ <sup>R</sup><sup>5</sup> that foils the test. We find that it features a Done event followed by a Prime event in the Receiver's queue. A simple static analysis of the Sender's machine in isolation shows that it permits no path from the send Done to the send Prime statement. The behavior of other machines is irrelevant for this invariant; we call it a *machine invariant*. We pass the invariant to our tool via the command line using the expression

$$\mathbf{G} \left( \mathbf{D} \text{ONE} \Rightarrow \mathbf{G} \neg \mathbf{P} \text{RIME} \right) \tag{2}$$

in a temporal-logic like notation called QuTL (Sect. 5.1), where G universally quantifies over all queue entries. Our tool includes a QuTL checker that determines that **every concretization** of ¯s violates property (2), concluding that ¯s is spurious and can be discarded. This turns out to be sufficient for convergence.

#### **3 Queue-(Un)Bounded Reachability Analysis**

**Communicating Queue Systems.** We consider P programs consisting of a fixed and known number n of machines communicating via event passing through unbounded FIFO queues.<sup>1</sup> For simplicity, we assume the machines are created at the start of the program; dynamic creation at a later time can be simulated by having the machine ignore all events until it receives a special creation event.

We model such a program as a *communicating queue system* (CQS). Formally, given <sup>n</sup> <sup>∈</sup> <sup>N</sup>, a CQS <sup>P</sup> <sup>n</sup> is a collection of <sup>n</sup> *queue automata* (QA) <sup>P</sup><sup>i</sup> = (Σ,Li, *Act* <sup>i</sup>, Δi, <sup>I</sup> <sup>i</sup> ), 1 ≤ i ≤ n. A QA consists of a finite queue alphabet Σ shared by all QA, a finite set L<sup>i</sup> of local states, a finite set *Act* <sup>i</sup> of action labels, a finite set Δ<sup>i</sup> ⊆ L<sup>i</sup> × (Σ ∪ {ε}) × *Act* <sup>i</sup> × L<sup>i</sup> × (Σ ∪ {ε}) of transitions, and an initial local state <sup>I</sup> <sup>i</sup> ∈ Li. An action label *act* ∈ *Act* <sup>i</sup> is of the form


The individual QA of a CQS model machines of a P program; hence we refer to QA states as *machine states*. A transmit action is the only communication mechanism among the QA.

*Semantics.* A *machine state* m of a QA is of the form (, Q) ∈L× Σ∗; state m<sup>I</sup> = (<sup>I</sup> , ε) is *initial*. We define machine transitions corresponding to internal actions as follows (transmit actions are defined later at the global level):

$$\frac{(\ell,\varepsilon)\stackrel{loc}{\to}(\ell',\varepsilon)\in\Delta}{(\ell,\mathcal{Q})\to(\ell',\mathcal{Q})}\quad\text{for }\ell,\ell'\in\mathcal{L},\ \mathcal{Q}\in\Sigma^{\*}\qquad\qquad(\text{local})$$

$$\frac{(\ell,e)\stackrel{deq}{\to}(\ell',\varepsilon)\in\Delta}{(\ell,e\mathcal{Q})\to(\ell',\mathcal{Q})}\quad\text{for }\ell,\ell'\in\mathcal{L},\ e\in\Sigma,\ \mathcal{Q}\in\Sigma^{\*}\qquad\qquad(\text{dequue})$$

<sup>1</sup> The P language permits unbounded machine creation, a feature that we do not allow here and that is not used in any of the benchmarks we are aware of.

A *(global) state* s of a CQS is a tuple (1, Q1),...,(n, Qn) where (i, Qi) ∈ <sup>L</sup><sup>i</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> for <sup>i</sup> ∈ {1,...,n}. State <sup>s</sup><sup>I</sup> <sup>=</sup> (<sup>I</sup> 1, ε),...,(<sup>I</sup> n, ε) is initial. We extend the machine transition relation → to states as follows:

$$\langle (\ell\_1, \mathbb{Q}\_1), \dots, (\ell\_n, \mathbb{Q}\_n) \rangle \to \langle (\ell'\_1, \mathbb{Q}'\_1), \dots, (\ell'\_n, \mathbb{Q}'\_n) \rangle$$

if there exists i ∈ {1,...,n} such that one of the following holds:

**(internal)** (i, Qi) → ( <sup>i</sup>, Q <sup>i</sup>), and for all k ∈ {1,...,n}\{i}, <sup>k</sup> = <sup>k</sup>, Q<sup>k</sup> = Q k; **(transmission)** there exists j ∈ {1,...,n} and e ∈ Σ such that:


The execution model of a CQS is strictly interleaving. That is, in each step, one of the two above transitions **(internal)** or **(transmission)** is performed for a nondeterministically chosen machine i.

**Queue-Bounded and Queue-Unbounded Reachability.** Given a CQS P <sup>n</sup>, a state s = (1, Q1),...,(n, Qn) , and a number k, the *queue-bounded reachability problem* (for s and k) determines whether s is *reachable under queue bound <sup>k</sup>*, i.e. whether there exists a path <sup>s</sup><sup>0</sup> <sup>→</sup> <sup>s</sup><sup>1</sup> ... <sup>→</sup> <sup>s</sup><sup>z</sup> such that <sup>s</sup><sup>0</sup> <sup>=</sup> <sup>s</sup><sup>I</sup> , <sup>s</sup><sup>z</sup> <sup>=</sup> <sup>s</sup>, and for i ∈ {0,...,z}, all queues in state s<sup>i</sup> have at most k events. Queue-bounded reachability for k is trivially decidable, by making enqueue actions for queues of size k *blocking* (the sender cannot continue), which results in a finite state space. We write R<sup>k</sup> = {s : s is reachable under queue bound k}.

Queue-bounded reachability will be used in this paper as a tool for solving our actual problem of interest: Given a CQS P <sup>n</sup> and a state s, the *Queue-UnBounded reachability Analysis (QUBA) problem* determines whether s is reachable, i.e. whether there exists a (queue-unbounded) path from s<sup>I</sup> to s. The QUBA problem is undecidable [8]. We write R (= - <sup>k</sup>∈<sup>N</sup> <sup>R</sup>k) for the set of reachable states.

#### **4 Convergence via Partial Abstract Transformers**

In this section, we formalize our approach to detecting the convergence of a suitable sequence of *observations* about the states R<sup>k</sup> reachable under k-bounded semantics. We define the observations as abstractions of those states, resulting in sets Rk. We then investigate the convergence of the sequence (Rk)<sup>∞</sup> <sup>k</sup>=0.

#### **4.1 List Abstractions of Queues**

Our abstraction function applies to queues, as defined below. Its action on machine and system states then follows from the hierarchical design of a CQS. Let |Q| denote the number of events in Q, and Q[i] the ith event in Q (0 ≤ i < |Q|).

**Definition 1.** *For a parameter* <sup>p</sup> <sup>∈</sup> <sup>N</sup>*, the list abstraction function* <sup>α</sup><sup>p</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> Σ<sup>∗</sup> *is defined as follows:*

*1.* αp(ε) = ε*. 2. For a non-empty queue* Q = P · e*,*

$$\alpha\_p(\mathcal{Q}) = \begin{cases} \alpha\_p(P) & \text{if there exists } j \text{ } s.t. \ p \le j < |P| \text{ and } \mathcal{Q}[j] = e \\ \alpha\_p(P) \cdot e & \text{otherwise} \end{cases} \tag{3}$$

Intuitively, α<sup>p</sup> abstracts a queue by leaving its first p events unchanged (an idea also used in [16]). Starting from position p it keeps only the first occurrence of each event e in the queue, if any; repeat occurrences are dropped.<sup>2</sup> The preservation of existence and order of the first occurrences of all present events motivates the term *list abstraction*. An alternative is an abstraction that keeps only the *set* (not: list) of queue elements from position p, i.e. it ignores multiplicity *and order*. This is by definition less precise than the list abstraction and provided no efficiency advantages in our experiments. An abstraction that keeps only the queue head proved cheap but too imprecise.

The motivation for parameter p is that many protocols proceed in *rounds* of repeating communication patterns, involving a bounded number of message exchanges. If p exceeds that number, the list abstraction's loss of information may be immaterial.

We write an abstract queue Q = αp(Q) in the form *pref* | *suff* s.t. p = |*pref* |, and refer to *pref* as Q's *prefix* (shared with Q), and *suff* as Q's *suffix*.

**Example 2.** *The queues* Q∈{*bbbba*, *bbba*, *bbbaa*} *are* <sup>α</sup>2*-equivalent:* <sup>α</sup>2(Q) = *bb* |*ba .*

We extend α<sup>p</sup> to act on a machine state via αp(i, Qi)=(i, αp(Qi)), on a state via αp(s) = (1, αp(Q1)),...,(n, αp(Qn)) , and on a set of states pointwise via αp(S) = {αp(s) : s ∈ S}.

*Discussion.* The abstract state space is finite since the queue prefix is of fixed size, and each event in the suffix is recorded at most once (the event alphabet is finite). The sets of reachable abstract states grow monotonously with increasing queue size bound k, since the sets of reachable concrete states do:

$$k\_1 \le k\_2 \quad \Rightarrow \quad R\_{k\_1} \subseteq R\_{k\_2} \quad \Rightarrow \quad \alpha\_p(R\_{k\_1}) \subseteq \alpha\_p(R\_{k\_2}) \; .$$

Finiteness and monotonicity guarantee convergence of the sequence of reachable abstract states.

We say the abstraction function α<sup>p</sup> *respects* a property of a state if, for any two αp-equivalent states (see Example 2), the property holds for both or for neither. Function α<sup>p</sup> respects properties that refer to the local-state part of a machine, and to the first p + 1 events of its queue (which are preserved by αp). In addition, the property may look beyond the prefix and refer to the existence of events in the queue, but not their frequency or their order after the first occurrence.

<sup>2</sup> Note that the head of the queue is always preserved by α*p*, even for p = 0.

The rich information preserved by the abstraction (despite being finite-state) especially pays off in connection with the defer feature in the P language, which allows machines to delay handling certain events at the head of a queue [11]. The machine identifies the first non-deferred event in the queue, a piece of information that is precisely preserved by the list abstraction (no matter what p).

**Definition 3.** *Given an abstract queue* <sup>Q</sup> <sup>=</sup> <sup>e</sup><sup>0</sup> ...ep−<sup>1</sup> <sup>|</sup> <sup>e</sup><sup>p</sup> ...ez−<sup>1</sup>*, the concretization function* <sup>γ</sup><sup>p</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>2</sup>Σ<sup>∗</sup> *maps* Q *to the* language *of the regular expression*

$$RE\_p(\overline{\mathcal{Q}}) := e\_0 \dots e\_{p-1} e\_p \{e\_p\}^\* e\_{p+1} \{e\_p, e\_{p+1}\}^\* \dots e\_{z-1} \{e\_p, \dots, e\_{z-1}\}^\*,\tag{4}$$

*i.e.* γp(Q) := L(*RE*p(Q))*.*

As a special case, *RE*p(ε) = ε and so γp(ε) = L(ε) = {ε} for the empty queue. We extend γ<sup>p</sup> to act on abstract (machine or global) states in a way analogous to the extension of αp, by moving it inside to the queues occurring in those states.

#### **4.2 Abstract Convergence Detection**

Recall that finiteness and monotonicity of the sequence (Rk)<sup>∞</sup> <sup>k</sup>=0 guarantee its convergence, so nothing seems more suggestive than to compute the limit. We summarize our overall procedure to do so in Algorithm 1. The procedure iteratively increases the queue bound k and computes the concrete and (per αpprojection) the abstract reachability sets R<sup>k</sup> and Rk. If, for some k, an error is detected, the procedure terminates (Lines 4–5; in practice implemented as an on-the-fly check).

**Algorithm 1.** Queue-unbounded reachability analysis

**Input**: CQS with transition relation <sup>→</sup> , <sup>p</sup> <sup>∈</sup> <sup>N</sup>, property <sup>Φ</sup> respected by <sup>α</sup>*p*. 1: **compute** R0; R<sup>0</sup> := α*p*(R0) 2: **for** <sup>k</sup> := 1 to <sup>∞</sup> **do** 3: **compute** R*k*; R*<sup>k</sup>* := α*p*(R*k*) 4: **if** <sup>∃</sup><sup>r</sup> <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* : <sup>r</sup> |<sup>=</sup> <sup>Φ</sup> **then** 5: **return** "error reachable with queue bound k" 6: **if** <sup>|</sup>R*k*<sup>|</sup> <sup>=</sup> <sup>|</sup>R*<sup>k</sup>*−<sup>1</sup><sup>|</sup> **then** 7: <sup>T</sup> := (α*<sup>p</sup>* ◦ *Imdeq* ◦ <sup>γ</sup>*p*)(R*k*) *partial* **best abstract transformer** 8: **if** <sup>T</sup> <sup>⊆</sup> <sup>R</sup>*<sup>k</sup>* **then** 9: **return** "safe for any queue bound"

The key of the algorithm is reflected in Lines 6–9 and is based on the following idea (all claims are proved as part of Theorem 4 below). If the computation of R<sup>k</sup> reveals no new abstract states in round k (Line 6; by monotonicity, "same size" implies "same sets"), we apply the *best abstract transformer* [9,27] *Im* := α<sup>p</sup> ◦ *Im*<sup>→</sup> ◦ γ<sup>p</sup> to Rk: if the result is contained in Rk, the abstract reachability sequence has converged. However, we can do better: we can restrict the successor function *Im*<sup>→</sup> of the CQS to *dequeue* actions, denoted *Imdeq* in Line 7. The ultimate reason is that firing a local or transmit action on two αp-equivalent states r and s results again in αp-equivalent states r and s . This fact does *not* hold for dequeue actions: the successors r and s of dequeues depend on the abstracted parts of r and s, resp., which may differ and become "visible" during the dequeue (e.g. the event behind the queue head moves into the head position). Our main result therefore is: if R<sup>k</sup> = R<sup>k</sup>−<sup>1</sup> and dequeue actions do not create new abstract states (Lines 7 and 8), sequence (Rk)<sup>∞</sup> <sup>k</sup>=0 has converged:

## **Theorem 4.** *If* R<sup>k</sup> = R<sup>k</sup>−<sup>1</sup> *and* T ⊆ Rk*, then for any* K ≥ k*,* R<sup>K</sup> = Rk*.*

If the sequence of reachable abstract states has converged, then **all** reachable concrete states (any k) belong to γp(Rk) (for the current k). Since the abstraction function α<sup>p</sup> respects property Φ, we know that if any reachable concrete state violated Φ, so would any other concrete state that maps to the same abstraction. However, for each abstract state in Rk, Line 4 has examined at least one state r in its concretization; a violation was not found. We conclude:

**Corollary 5.** *Line 9 of Algorithm 1 correctly asserts that no reachable concrete state of the given CQS violates* Φ*.*

The corollary (along with the earlier statement about Lines 4–5) confirms the partial correctness of Algorithm 1. The procedure is, however, necessarily incomplete: if no error is detected and the convergence condition in Line 8 never holds, the **for** loop will run forever.

We conclude this part with two comments. First, note that we do not compute the sets R<sup>k</sup> as reachability fixpoints in the abstract domain (i.e. the domain of αp). Instead, we compute the *concrete* reachability sets first, and then obtain the R<sup>k</sup> via projection (Line 1). The reason is that the projection gives us the *exact* set of abstractions of reachable concrete states, while an abstract fixpoint likely overapproximates (for instance, the best abstract transformer from Line 7 does) and loses precision. Note that a primary motivation for computing abstract fixpoints, namely that the concrete fixpoint may not be computable, does not apply here: the concrete domains are finite, for each k.

Second, we observe that this projection technique comes with a cost: sequence (Rk)<sup>∞</sup> <sup>k</sup>=0 may *stutter* at intermediate moments: <sup>R</sup><sup>k</sup> - Rk+1 = Rk+2 - Rk+3. The reason is that Rk+3 is not obtained as a functional image of Rk+2, but by projection from Rk+3. As a consequence, we cannot short-cut the convergence detection by just "waiting" for (Rk)<sup>∞</sup> <sup>k</sup>=0 to stabilize, despite the finite domain.

#### **4.3 Computing Partial Best Abstract Transformers**

Recall that in Line 7 we compute

$$\overline{T} = \overline{Im}\_{deq}(\overline{R}\_k) = (\alpha\_p \circ Im\_{deq} \circ \gamma\_p)(\overline{R}\_k) \,. \tag{5}$$

The line applies the best abstract transformer, restricted to dequeue actions, to Rk. This result cannot be computed as defined in (5), since γp(Rk) is typically infinite. However, R<sup>k</sup> is finite, so we can iterative over ¯r ∈ Rk, and little information is actually needed to determine the abstract successors of ¯r. The "infinite fragment" of ¯r remains unchanged, which makes the action implementable.

Formally, let ¯r = (, Q) with Q = e0e<sup>1</sup> ...ep−<sup>1</sup> | epep+1 ...ez−<sup>1</sup>. To apply a dequeue action to ¯r, we first perform local-state updates on as required by the action, resulting in . Now consider Q. The first suffix event, ep, moves into the prefix due to the dequeue. We do not know whether there are later occurrences of e<sup>p</sup> before or after the first suffix occurrences of ep+1 ...e<sup>z</sup>−<sup>1</sup>. This information determines the possible abstract queues resulting from the dequeue. To compute the exact best abstract transformer, we enumerate these possibilities:

$$\begin{aligned} \overline{\lim}\_{\begin{subarray}{c} \{\ell \mid \langle \ell, \overline{\mathcal{Q}} \rangle \} \} \end{subarray} &= \\ \{\ (\ell', \overline{\mathcal{Q}}') \, : \, \overline{\mathcal{Q}}' \in \begin{cases} e\_1 \ldots e\_p \, \vert \, e\_{p+1} e\_{p+2} \ldots e\_{z-1} \\ e\_1 \ldots e\_p \vert \overline{\begin{e\_p}} e\_{p+1} e\_{p+2} \ldots e\_{z-1} \\ e\_1 \ldots e\_p \vert \overline{\begin{e\_p}} e\_{p+1} \overline{\begin{e\_p}} e\_{p+2} \ldots e\_{z-1} \\ \vdots \\ e\_1 \ldots e\_p \vert \, e\_{p+1} e\_{p+2} \ldots e\_{z-1} \overline{\boxdot{e\_p}} \end{e\_p} \end{pmatrix} \} \end{aligned}$$

The first case for <sup>Q</sup> applies if there are no occurrences of <sup>e</sup><sup>p</sup> in the suffix after the dequeue. The remaining cases enumerate possible positions of the *first* occurrence of e<sup>p</sup> (boxed, for readability) in the suffix after the dequeue. The cost of this enumeration is linear in the length of the suffix of the abstract queue.

Since our list abstraction maintains the first occurrence of each event, the semantics of defer (see the *Discussion* in Sect. 4.1) can be implemented abstractly without loss of information (not shown above, for simplicity).

#### **5 Abstract Queue Invariant Checking**

The abstract transformer function in Sect. 4 is used to decide whether sequence (Rk)<sup>∞</sup> <sup>k</sup>=0 has converged. Being an overapproximation, the function may generate *spurious* states: they are not reachable, i.e. no concretization of them is. Unfortunate for us, spurious abstract states always prevent convergence.

A key empirical observation is that concretizations of spurious abstract states often violate simple machine invariants, which can be proved from the perspective of a single machine, while collapsing all other machines into a nondeterministically behaving environment. Consider our example from Sect. 2 for p = 0. It fails to converge since Line 7 generates an abstract state ¯s that features a Done event followed by a Prime event in the Receiver's queue. A light-weight static analysis proves that the Sender's machine permits no path from the send Done to the send Prime statement. Since **every** concretization of ¯s features a Done followed by a Prime event, the abstract state ¯s is spurious and can be eliminated.

Our tool assists users in *discovering* candidate machine invariants, by facilitating the inspection of states in T \R<sup>k</sup> (which foil the test in Line 8). We *discharge* such invariants separately, via a simple sequential model-check or static analysis. In the section we focus on the more interesting question of how to *use* them. Formally, suppose the P program comes with a *queue invariant* I, i.e. an invariant property of *concrete* queues. The *abstract invariant checking problem* is to decide, for a given abstract queue Q, whether *every* concretization of Q violates I; in this case, and this case only, an abstract state containing Q can be eliminated. In the following we define a language QuTL for specifying concrete queue invariants (5.1), and then show how checking an abstract queue against a QuTL invariant can be efficiently solved as a model checking problem (5.2).

#### **5.1 Queue Temporal Logic (QuTL)**

Our logic to express invariant properties of queues is a form of first-order lineartime temporal logic. This choice is motivated by the logic's ability to constrain the order (via temporal operators) and multiplicity of queue events, the latter via relational operators that express conditions on the number of event occurrences.

*Queue Relational Expressions (QuRelE).* These are of the form #e c, where <sup>e</sup> <sup>∈</sup> <sup>Σ</sup> (queue alphabet), ∈ {<, <sup>≤</sup>, <sup>=</sup>, <sup>≥</sup>, >}, and <sup>c</sup> <sup>∈</sup> <sup>N</sup> is a literal natural number. The *value* of a QuRelE is defined as the Boolean

$$V(\#e \rhd c) \quad = \quad |\{i \in \mathbb{N} : 0 \le i < |\mathcal{Q}| \land \mathcal{Q}[i] = e\}| \quad \rhd c \tag{6}$$

where |·| denotes set cardinality and is interpreted as the standard integer arithmetic relational operator. In the following we write Q[*i* →] (read: "Q from i") for the queue obtained from queue Q by dropping the first i events.

**Definition 6 (Syntax of QuTL).** *The following are QuTL formulas:*


*The set QuTL is the Boolean closure of the above set of formulas.*

**Definition 7 (Concrete semantics of QuTL).** *Concrete queue* <sup>Q</sup> *satisfies QuTL formula* φ*, written* Q |= φ*, depending on the form of* φ *as follows.*


*Satisfaction of Boolean combinations is defined as usual, e.g.* Q |= ¬φ *iff* Q |= φ*. No other pair* (Q, φ) *satisfies* Q |= φ*.*

For instance, formula #e ≤ 3 is true exactly for queues containing at most 3 e's, and formula G(#e ≥ 1) is true of Q iff Q is empty or its final event (!) is e. See App. B of [23] for more examples.

Algorithmically checking whether a concrete queue Q satisfies a QuTL formula φ is straightforward, since Q is of fixed size and straight-line. The situation is different with abstract queues. Our motivation here is to declare that an abstract queue Q *violates* a formula φ if *all its concretizations* (Definition 3) do: under this condition, if φ is an invariant, we know Q is not reachable. Equivalently:

**Definition 8 (Abstract semantics of QuTL).** *Abstract queue* <sup>Q</sup> *satisfies QuTL formula* φ*, written* Q |=<sup>p</sup> φ*, if some concretization of* Q *satisfies* φ*:*

$$\overline{\mathcal{Q}} \vdash\_p \phi \quad := \quad \exists \mathcal{Q} \in \gamma\_p(\overline{\mathcal{Q}}) : \mathcal{Q} \vdash \phi. \tag{7}$$

For example, we have *bb* |*ba* |=<sup>2</sup> G(a ⇒ G ¬b) since for instance *bbba* ∈ γ2(*bb* | *ba*) satisfies the formula. See App. B of [23] for more examples.

**Fig. 2.** LTS for <sup>Q</sup> <sup>=</sup> *bb* <sup>|</sup> *abc* (<sup>p</sup> = 2), with label sets written below each state. The blue and red parts encode the concretizations of the prefix and suffix of Q, resp. (Color figure online)

#### **5.2 Abstract QuTL Model Checking**

A QuTL *constraint* is a QuTL formula without Boolean connectives. We first describe how to model check against QuTL constraints, and come back to Boolean connectives at the end of Sect. 5.2.

Model checking an abstract queue Q against a QuTL constraint φ, i.e. checking whether some concretization of Q satisfies φ, can be reduced to a standard model checking problem over a labeled transition system (LTS) M = (S, T, L) with states <sup>S</sup>, transitions <sup>T</sup>, and a labeling function <sup>L</sup>: <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>Σ</sup> ∪ {ε}. The LTS *characterizes* the concretization γp(Q) of Q, as illustrated in Fig. 2 using an example: the concretizations of Q are formed from the regular-expression traces generated by paths of Q's LTS that end in the double-circled green state.

The straightforward construction of the LTS M is formalized in App. A.2 of [23]. Its size is linear in |Q|: |S| = p+2×(|Q|−p)+1 and |T| = p+4×(|Q|−p).

We call a path through M *complete* if it ends in the right-most state s<sup>z</sup> of M (green in Fig. 2). The labeling function extends to paths via L(s<sup>i</sup> → ... → s<sup>j</sup> ) = L(si) · ... · L(s<sup>j</sup> ). This gives rise to the following characterization of γp(Q):

**Lemma 9.** *Given abstract queue* Q *over alphabet* Σ*, let* M = (S, T, L) *be its LTS.*

$$\gamma\_p(\overline{\mathcal{Q}}) \;= \bigcup \{ \mathcal{L}(L(\pi)) \in 2^{\Sigma^\*} \mid \pi \text{ is a complete path from } s\_0 \text{ in } M \}. \tag{8}$$

We say path π *satisfies* φ, written π |=<sup>p</sup> φ, if there exists Q∈L(L(π)) s.t. Q |= φ.

**Corollary 10.** *Let* Q *and* M *as in Lemma 9, and* φ *a QuTL constraint. Then the following are equivalent.*

*1.* Q |=<sup>p</sup> φ*.*

*2. There exists a complete path* π *from* s<sup>0</sup> *in* M *such that* π |=<sup>p</sup> φ*.*

*Proof.* immediate from Definition 8 and Lemma 9.

Given an abstract queue Q, its LTS M, and a QuTL constraint φ, our abstract queue model checking algorithm is based on Corollary 10: we need to find a complete path from s<sup>0</sup> in M that satisfies φ. This is similar to standard model checking against existential temporal logics like ECTL, with two particularities:

First, paths must be complete. This poses no difficulty, as completeness is suffix-closed: a path ends in s<sup>z</sup> iff any suffix does. This implies that temporal reductions on QuTL constraints work like in standard temporal logics. For example: there exists a complete path π from s<sup>0</sup> in M such that π |=<sup>p</sup> X φ iff there exists a complete path π from some successor s<sup>1</sup> of s<sup>0</sup> such that π |=<sup>p</sup> φ.

Second, we have domain-specific atomic (non-temporal) propositions. These are accommodated as follows, for an arbitrary start state s ∈ S:

∃π : π **from** s **complete and** π |=<sup>p</sup> e **(for** e ∈ Σ**):**

this is true iff e ∈ L(s), as is immediate from the Q |= e case in Definition 7. <sup>∃</sup><sup>π</sup> : <sup>π</sup> **from** <sup>s</sup> **complete and** <sup>π</sup> <sup>|</sup>=<sup>p</sup> #e>c **(for** <sup>e</sup> <sup>∈</sup> Σ,c <sup>∈</sup> <sup>N</sup>**):** this is true iff

– the number of states reachable from s labeled e is greater than c, **or**

– there exists a state reachable from s labeled with e that has a self-loop.

The other relational expressions #e c are checked similarly.

*Boolean Connectives.* Let now φ be a full-fledged QuTL formula. We first bring it into negation normal form, by pushing negations inside, exploiting the usual dualities ¬X = X¬, ¬F = G¬, and ¬G = F ¬. The subset ∈ {<, ≤, ≥, >} of the queue relational expressions is semantically closed under negation; "¬=" is replaced by "> ∨ <". A path π from s satisfies ¬e (for e ∈ Σ) iff L(s) = {e}: this condition states that either L(s) = ε, or there exists some label other than e in L(s), so the *existential* property ¬e holds.

Disjunctions are handled by distributing |=<sup>p</sup> over them: Q |=<sup>p</sup> φ<sup>1</sup> ∨ φ<sup>2</sup> iff Q |=<sup>p</sup> φ<sup>1</sup> ∨ Q |=<sup>p</sup> φ2. What remains are conjunctions. The existential flavor of |=<sup>p</sup> implies that |=<sup>p</sup> does *not* distribute over them; see Ex. 13 in App. B.1 of [23]. Suppose we ignore this and replace a check of the form Q |=<sup>p</sup> φ<sup>1</sup> ∧ φ<sup>2</sup> by the **weaker** check Q |=<sup>p</sup> φ<sup>1</sup> ∧ Q |=<sup>p</sup> φ2, which may produce false positives. Now consider how we use these results: if Q |=<sup>p</sup> φ holds, we decide to *keep* the state containing the abstract queue. False positives during abstract model checks therefore may create extra work, but do not introduce unsoundness. In summary, our abstract model checking algorithm soundly approximates conjunctions, but remains exact for the purely disjunctive fragment of QuTL.

**Table 1.** Results: #*M* : #P machines; *Loc*: #lines of code; *Safe*? = ✓: property holds; p: *minimum* unabstracted prefix for required convergence; kmax: point of convergence or exposed bugs (– means divergence); *Time*: runtime (sec); *Mem.*: memory usage (Mb.).


#### **6 Empirical Evaluation**

We implemented the proposed approaches in C# atop the bounded model checker PTester [11], an analysis tool for P programs. PTester employs a bounded exploration strategy similar to Zing [4]. We denote by Pat the implementation of Algorithm 1, and by Pat+I the version with queue invariants ("Pat+ Invariants"). A detailed introduction to tool design and implementation is available online [22].

*Experimental Goals.* We evaluate the approaches against the following questions:

**Q1.** Is Pat effective: does it converge for many programs? for what values of k?

**Q2.** What is the impact of the QuTL invariant checking?

*Experimental Setup.* We collected a set of P programs (available online [22]); most have been used in previous publications:

**1–5:** protocols implemented in P: the German Cache Coherence protocol with different number of clients (**1–2**) [11], a buggy version of a token ring protocol [11], and a fixed version (**3–4**), and a failure detector protocol from [25] (**5**).

**6–7:** two device drivers where OSR is used for testing USB devices [10].

**8–14:** miscellaneous: **8–10** [25], **11** [15], **12** is the example from Sect. 2, **13–14** are the buggy and fixed versions of an Elevator controller [11].

We conduct two types of experiments: (i) we run Pat on each benchmark to empirically answer **Q1**; (ii) we run Pat+I on the examples which fail to verify in (i) to answer **Q2**. All experiments are performed on a 2.80 GHz Intel(R) Core(TM) i7-7600 machine with 8 GB memory, running 64-bit Windows 10. The timeout is set to 3600 s (1h); the memory limit to 4 GB.

**Results.** Table 1 shows that Pat converges on *almost all* safe examples (and successfully exposes the bugs for unsafe ones). Second, in most cases, the kmax where convergence was detected is small, 5 or less. This is what enables the use of this technique in practice: the exploration space grows fast with k, so early convergence is critical. Note that kmax is guaranteed to be the smallest value for which the respective example converges. If convergent, the verification succeeded fully automatically: the queue abstraction prefix parameter p is incremented in a loop whenever the current value of p caused a spurious abstract state.

The German protocol does not converge in reasonable time. In this case, we request minimal manual assistance from the designer. Our tool inspects spurious abstract states, compares them to actually reached abstract states, and suggests candidate invariants to exclude them. We describe the process of invariant discovery, and why and how they are easy to prove, in [22].

The following table shows the invariants that make the German protocol converge, and the resulting times and memory consumption.


The invariant states that there is always at most one exclusive request and at most one shared request in the Server or Client machine's queue.

*Performance Evaluation.* We finally consider the following question: *To perform full verification, how much overhead does* Pat *incur compared to PTester?* We iteratively run PTester with a queue bound from 1 up to kmax (from Table 1).

compares the running times of Pat and PTester. We observe that the difference is small, in all cases, suggesting that turning PTester into a full verifier comes with little

extra cost. Therefore, as for improving Pat's scalability, the focus should be on the efficiency of the R<sup>k</sup> computation (Line 3 in Algorithm 1). Techniques that lend themselves here are *partial order reduction* [2,28] or *symmetry reduction* [29]. Note that our proposed approach is orthogonal to how these sets are computed.

#### **7 Related Work**

Automatic verification for asynchronous event-driven programs communicating via unbounded FIFO queues is undecidable [8], even when the agents are finitestate machines. To sidestep the undecidability, various remedies are proposed. One is to underapproximate program behaviors using various bounding techniques; examples include depth- [17] and context-bounded analysis [19,20,26], delay-bounding [13], bounded asynchrony [15], preemption-bounding [24], and phase-bounded analysis [3,6]. It has been shown that most of these bounding techniques admit a decidable model checking problem [19,20,26] and thus have been successfully used in practice for finding bugs.

Gall et al. proposed an abstract interpretation of FIFO queues in terms of regular languages [16]. While our works share some basic insights about taming queues, the differences are fundamental: our abstract domain is *finite*, guaranteeing convergence of our sequence. In [16] the abstract domain is infinite; they propose a widening operator for fixpoint computation. More critically, we use the abstract domain *only* for convergence detection; the set of reachable states returned is in the end exact. As a result, we can prove and refute properties but may not terminate; [16] is inexact and cannot refute but always returns.

Several partial verification approaches for asynchronous message-passing programs have been presented recently [5,7,10]. In [5], Bakst et al. propose *canonical sequentialization*, which avoids exploring all interleavings by sequentializing concurrent programs. Desai et al. [10] propose an alternative way, namely by prioritizing receive actions over send actions. The approach is complete in the sense that it is able to construct *almost-synchronous invariants* that cover all reachable local states and hence suffice to prove local assertions. Similarly, Bouajjani et al. [7] propose an iterative analysis that bounds send actions in each interaction phase. It approaches the completeness by checking a program's synchronizability under the bounds. Similar to our work, the above three works are sound but incomplete. An experimental comparison against the techniques reported in [7,10] fails due to the unavailability of a tool that implements them. While tools implementing these techniques are not available [7,10], a comparison based on what is reported in the papers suggests that our approach is competitive in both performance and precision.

Our approach can be categorized as a *cutoff* detection technique [1,12,14,28]. Cutoffs are, however, typically determined statically, often leaving them too large for practical verification. Aiming at minimal cutoffs, our work is closer in nature to earlier *dynamic* strategies [18,21], which targeted different forms of concurrent programs. The *generator* technique proposed in [21] is unlikely to work for P programs, due to the large local state space of machines.

#### **8 Conclusion**

We have presented a method to verify safety properties of asynchronous eventdriven programs of agents communicating via unbounded queues. Our approach is sound but incomplete: it can both prove (or, by encountering bugs, disprove) such properties but may not terminate. We empirically evaluate our method on a collection of P programs. Our experimental results showcase our method can successfully prove the correctness of programs; such proof is achieved with little extra resource costs compared to plain state exploration. Future work includes an extension to P programs with other sources of unboundedness than the queue length (e.g. messages with integer *payloads*).

**Acknowledgments.** We thank Dr. Vijay D'Silva (Google, Inc.), for enlightening discussions about partial abstract transformers.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Inferring Inductive Invariants from Phase Structures**

Yotam M. Y. Feldman1(B) , James R. Wilcox2, Sharon Shoham1, and Mooly Sagiv1

> <sup>1</sup> Tel Aviv University, Tel Aviv, Israel yotam.feldman@gmail.com <sup>2</sup> University of Washington, Seattle, USA

**Abstract.** Infinite-state systems such as distributed protocols are challenging to verify using interactive theorem provers or automatic verification tools. Of these techniques, deductive verification is highly expressive but requires the user to annotate the system with *inductive invariants*. To relieve the user from this laborintensive and challenging task, *invariant inference* aims to find inductive invariants automatically. Unfortunately, when applied to infinite-state systems such as distributed protocols, existing inference techniques often diverge, which limits their applicability.

This paper proposes *user-guided invariant inference* based on *phase invariants*, which capture the different logical phases of the protocol. Users conveys their intuition by specifying a *phase structure*, an automaton with edges labeled by program transitions; the tool automatically infers assertions that hold in the automaton's states, resulting in a full safety proof. The additional structure from phases guides the inference procedure towards finding an invariant.

Our results show that user guidance by phase structures facilitates successful inference beyond the state of the art. We find that phase structures are pleasantly well matched to the intuitive reasoning routinely used by domain experts to understand why distributed protocols are correct, so that providing a phase structure reuses this existing intuition.

#### **1 Introduction**

Infinite-state systems such as distributed protocols remain challenging to verify despite decades of work developing interactive and automated proof techniques. Such proofs rely on the fundamental notion of an *inductive invariant*. Unfortunately, specifying inductive invariants is difficult for users, who must often repeatedly iterate through candidate invariants before achieving an inductive invariant. For example, the Verdi project's proof of the Raft consensus protocol used an inductive invariant with 90 conjuncts and relied on significant manual proof effort [61,62].

The dream of *invariant inference* is that users would instead be assisted by automatic procedures that could infer the required invariants. While other domains have seen successful applications of invariant inference, using techniques such as abstract interpretation [18] and property-directed reachability [10,21], existing inference techniques fall short for interesting distributed protocols, and often diverge while searching for an invariant. These limitations have hindered adoption of invariant inference.

*Our Approach.* The idea of this paper is that invariant inference can be made drastically more effective by utilizing *user-guidance* in the form of *phase structures*. We propose user-guided invariant inference, in which the user provides some additional information to guide the tool towards an invariant. An effective guidance method must (1) match users' high-level intuition of the proof, and (2) convey information in a way that an automatic inference tool can readily utilize to direct the search. In this setting invariant inference turns a partial, high-level argument accessible to the user into a full, formal correctness proof, overcoming scenarios where procuring the proof completely automatically is unsuccessful.

Our approach places *phase invariants* at the heart of both user interaction and algorithmic inference. Phase invariants have an automaton-based form that is well-suited to the domain of distributed protocols. They allow the user to convey a high-level temporal intuition of why the protocol is correct in the form of a *phase structure*. The phase structure provides hints that direct the search and allow a more targeted generalization of states to invariants, which can facilitate inference where it is otherwise impossible.

This paper makes the following contributions:


Overall, our approach demonstrates that the seemingly inherent intractability of sifting through a vast space of candidate invariants can be mitigated by leveraging users' high-level intuition.

#### **2 Preliminaries**

In this section we provide background on first-order transition systems. Sorts are omitted for simplicity. Our results extend also to logics with a background theory.

*Notation. FV*(ϕ) denotes the set of free variables of ϕ. FΣ(V ) denotes the set of firstorder formulas over vocabulary Σ with *FV*(ϕ) ⊆ V . We write ∀V. ϕ =⇒ ψ to denote that the formula ∀V. ϕ → ψ is valid. We sometimes use f<sup>a</sup> as a shorthand for f(a).

*Transition Systems.* We represent transition systems symbolically, via formulas in firstorder logic. The definitions are standard. A vocabulary Σ consisting of constant, function, and relation symbols is used to represent states. Post-states of transitions are represented by a copy of Σ denoted Σ = {a | a ∈ Σ}. A *first-order transition system* over Σ is a tuple *TS* = (*Init*, *TR*), where *Init* ∈ FΣ(∅) describes the initial states, and *TR* ∈ FΣ<sup>ˆ</sup> (∅) with <sup>Σ</sup><sup>ˆ</sup> <sup>=</sup> <sup>Σ</sup> <sup>Σ</sup> describes the transition relation. The states of *TS* are first-order structures over Σ. A state s is initial if s |= *Init*. A transition of *TS* is a pair of states s1, s<sup>2</sup> over a shared domain such that (s1, s2) |= *TR*, (s1, s2) being the structure over that domain in which Σ in interpreted as in s<sup>1</sup> and Σ as in s2. s<sup>1</sup> is also called the *pre-state* and s<sup>2</sup> the *post-state*. Traces are finite sequences of states σ1, σ2,... starting from an initial state such that there is a transition between each pair of consecutive states. The *reachable states* are those that reside on traces starting from an initial state.

*Safety.* A safety property P is a formula in FΣ(∅). We say that *TS* is *safe*, and that P is an *invariant*, if all the reachable states satisfy P. *Inv* ∈ FΣ(∅) is an *inductive invariant* if (i) *Init* =⇒ *Inv* (initiation), and (ii) *Init* ∧ *TR* =⇒ *Inv* (consecution), where *Inv* is obtained from *Inv* by replacing each symbol from Σ with its primed counterpart. If also (iii) *Inv* =⇒ P (safety), then it follows that *TS* is safe.

#### **3 Running Example: Distributed Key-Value Store**

We begin with a description of the running example we refer to throughout the paper.

The *sharded key-value store with retransmissions (KV-R)*, adapted from Iron-Fleet [33, §5.2.1], is a distributed hash table where each node owns a subset of the keys, and keys can be dynamically transferred among nodes to balance load. The safety property ensures that each key is globally associated with one value, even in the presence of key transfers. Messages might be dropped by the network, and the protocol uses retransmissions and sequence numbers to maintain availability and safety.

Figure 1 shows code modeling the protocol in a relational first-order language akin to Ivy [45], which compiles to EPR transition systems. The state of nodes and the network is modeled by global relations. Lines 1 to 4 declare uninterpreted sorts for keys, values, clients, and sequence numbers. Lines 6 to 14 describe the state, consisting of: (i) local state of clients pertaining to the table (which nodes are owners of which keys, and the local shard of the table mapping keys to values); (ii) local state of clients pertaining to sent and received messages (seqnum\_sent, unacked, seqnum\_recvd); and (iii) the state of the network, comprised of two kinds of messages (transfer\_msg, ack\_msg). Each message kind is modeled as a relation whose first two arguments indicate the source

**Fig. 1.** Sharded key-value store with retransmissions (KV-R) in a first-order relational modeling.

and destination of the message, and the rest carry the message's payload. For example, ack\_msg is a relation over two nodes and a sequence number, with the intended meaning that a tuple (c1, c2, s) is in ack\_msg exactly when there is a message in the network from c<sup>1</sup> to c<sup>2</sup> acknowledging a message with sequence number s.

The initial states are specified in Lines 17 to 18. Transitions are specified by the actions declared in Lines 20 to 66. Actions can fire nondeterministically at any time when their precondition (require statements) holds. Hence, the transition relation comprises of the disjunction of the transition relations induced by the actions. The state is mutated by modifying the relations. For example, message sends are modeled by inserting a tuple into the corresponding relation (e.g. line 27), while message receives are modeled by requiring a tuple to be in the relation (e.g. line 32), and then removing it (e.g. line 33). The updates in lines 61 and 65 remove a set of tuples matching the pattern.

Transferring keys between nodes begins by sending a transfer\_msg from the owner to a new node (line 20), which stores the key-value pair when it receives the message (line 39). Upon sending a transfer message the original node cedes ownership (line 26) and does not send new transfer messages. Transfer messages may be dropped (line 30). To ensure that the key-value pair is not lost, retransmissions are performed (line 35) with the same sequence number until the target node acknowledges (which occurs in line 47). Acknowledge messages themselves may be dropped (line 53). Sequence numbers protect from delayed transfer messages, which might contain old values (line 42).

Lines 68 to 71 specify the key safety property: at most one value is associated with any key, anywhere in the network. Intuitively, the protocol satisfies this because each key k is either currently (1) *owned* by a node, in which case this node is unique, or (2) it is in the process of *transferring* between nodes, in which case the careful use of sequence numbers ensures that the destination of the key is unique. As is typical, it is not straightforward to translate this intuition into a full correctness proof. In particular, it is necessary to relate all the different components of the state, including clients' local state and pending messages.

*Invariant inference* strives to automatically find an inductive invariant establishing safety. This example is challenging for existing inference techniques (Sect. 6). This paper proposes *user-guided invariant inference* based on *phase-invariants* to overcome this challenge. The rest of the paper describes our approach, in which inference is provided with the phase structure in Fig. 2, matching the high level intuitive explanation above. The algorithm then automatically infers facts about each phase to obtain an inductive invariant. Sect. 4 describes phase structures and inductive phase invariants, and Sect. 5 explains how these are used in user-guided invariant inference.

#### **4 Phase Structures and Invariants**

In this section we introduce *phase structures* and *inductive phase invariants*. These are used for guiding automatic invariant inference in Sect. 5. Proofs appear in [24].

#### **4.1 Phase Invariants**

**Definition 1 (Quantified Phase Automaton).** *A* quantified phase automaton *(*phase automaton *for short) over* Σ *is a tuple* A = (Q, ι, V, δ,ϕ) *where:* Q *is a finite set of* phases*.* ι ∈ Q *is the initial phase.* V *is a set of variables, called the* automaton's quantifiers*.* δ : Q×Q → FΣ<sup>ˆ</sup> (V) *is a function labeling every pair of phases by a transition relation formula, such that FV*(δ(q,p)) ⊆ V *for every* (q, p) ∈Q×Q*.* ϕ : Q→FΣ(V) *is a function labeling every phase by a* phase characterization *formula, s.t. FV*(ϕq) ⊆ V *for every* q ∈ Q*.*

Intuitively, V should be understood as free variables that are implicitly universally quantified outside of the automaton's scope. For each assignment to these variables, the automaton represents the progress along the phases from the point of view of this assignment, and thus V is also called the *view* (or *view quantifiers*).

We refer to (Q, ι, V, δ), where ϕ is omitted, as the *phase structure* (or the *automaton structure*) of A. We refer by the *edges* of A to R = {(q, p) ∈Q×Q| δ(q,p) ≡ *false*}. A *trace* of A is a sequence of phases q0,...,q<sup>n</sup> such that q<sup>0</sup> = ι and (qi, qi+1) ∈ R for every 0 ≤ i<n. We say that A is *deterministic* if for every (q, p1),(q, p2) ∈ R s.t. p1 = p2, the formula δ(q,p1) ∧ δ(q,p2) is unsatisfiable.

*Example 1.* Figure 2 shows a phase automaton for the running example, with the view of a single key k. It describes the protocol as transitioning between two distinct (logical)

**Fig. 2.** Phase structure for key-value store (top) and phase characterizations (bottom). The user provides the phase structure, and inference automatically produces the phase characterizations, forming a safe inductive phase automaton.

phases of k: *owned* (O[k]) and *transferring* (T[k]). The edges are labeled by actions of the system. A wildcard \* means that the action is executed with an arbitrary argument. The two central actions are (i) reshard, which transitions from O[k] to T[k], but cannot execute in T[k], and (ii) recv\_transfer\_message, which does the opposite. The rest of the actions do not cause a phase change and appear on a self loop in each phase. Actions related to keys other than k are considered as self-loops, and omitted here for brevity. Some actions are *disallowed* in certain phases, namely, do not label *any* outgoing edge from a phase, such as recv\_transfer\_msg(*k*) in O[k]. *Characterizations* for each phase are depicted in Fig. 2 (bottom). Without them, Fig. 2 represents a *phase structure*, which serves as the input to our inference algorithm. We remark that the choice of automaton aims to reflect the safety property of interest. In our example, one might instead imagine taking the view of a single node as it interacts with multiple keys, which might seem intuitive from the standpoint of implementing the system. However, it is not appropriate for the proof of value uniqueness, since keys pass in and out of the view of a single client.

We now formally define *phase invariants* as phase automata that overapproximate the behaviors of the original system.

**Definition 2 (Language of Phase Automaton).** *Let* A *be a quantified phase automaton over* Σ*, and* σ = σ0,...,σ<sup>n</sup> *a finite sequence of states over* Σ*, all with domain* D*. Let* v : V → D *be a valuation of the automaton quantifiers. We say that:*

*–* σ, v |= A *if there exists a trace of phases* q0,...,q<sup>n</sup> *such that* (σi, σi+1), v |= δ(q*i*,q*i*+1) *for every* 0 ≤ i<n *and* σi, v |= ϕ<sup>q</sup>*<sup>i</sup> for every* 0 ≤ i ≤ n*. –* σ |= A *if* σ, v |= A *for every valuation* v*.*

*The language of* A *is* L(A) = {σ | σ |= A}*.*

**Definition 3 (Phase Invariant).** *A phase automaton* A *is a* phase invariant *for a transition system TS if* L(*TS*) ⊆ L(A)*, where* L(*TS*) *denotes the set of finite traces of TS.*

*Example 2.* The phase automaton of Fig. 2 is a *phase invariant* for the protocol: intuitively, whenever an execution of the protocol reaches a phase, its characterizations hold. This fact may not be straightforward to establish. To this end we develop the notion of *inductive* phase invariants.

#### **4.2 Establishing Safety and Phase Invariants with Inductive Phase Invariants**

To establish phase invariants, we use inductiveness:

**Definition 4 (Inductive Phase Invariant).** A *is* inductive w.r.t. *TS* = (*Init*, *TR*) *if:*

**Initiation:** *Init* =⇒ (∀V. ϕι) *.* **Inductiveness:** *for all* (q, p) ∈ R*,* ∀V. - ϕ<sup>q</sup> ∧ δ(q,p) =⇒ ϕ p *.* **Edge Covering:** *for every* q ∈ Q*,* ∀V. ϕ<sup>q</sup> ∧ *TR* =⇒ (q,p)∈R <sup>δ</sup>(q,p) *.*

*Example 3.* The phase automaton in Fig. 2 is an inductive phase invariant. For example, the only disallowed transition in O[k] is recv\_transfer\_message, which indeed cannot execute in O[k] according to the characterization in line 75. Further, if, for example, a protocol's transition from O[k] matches the labeling of the edge to T[k] (i.e. a reshard action on k), the post-state necessarily satisfies the characterizations of T[k]: for instance, the post-state satisfies the uniqueness of unreceived transfer messages (line 82) because in the pre-state there are none (line 75).

**Lemma 1.** *If* A *is inductive w.r.t. TS then it is a phase invariant for TS.*

*Remark 1.* The careful reader may notice that the inductiveness requirement is stronger than needed to ensure that the characterizations form a phase invariant. It could be weakened to require for every q ∈ Q: ∀V. ϕ<sup>q</sup> ∧*TR* =⇒ (q,p)∈R <sup>δ</sup>(q,p) <sup>∧</sup> <sup>ϕ</sup> <sup>p</sup>. However, as we explain in Sect. 5, our notion of inductiveness is crucial for *inferring* inductive phase automata, which is the goal of this paper. Furthermore, for deterministic phase automata, the two requirements coincide.

*Inductive Invariants vs. Inductive Phase Invariants.* Inductive invariants and inductive phase invariants are closely related:

**Lemma 2.** *If* A *is inductive w.r.t. TS then* ∀V. <sup>q</sup>∈Q <sup>ϕ</sup><sup>q</sup> *is an inductive invariant for TS. If Inv is an inductive invariant for TS, then the phase automaton* A*Inv* = ({q}, {q}, ∅, δ,ϕ)*, where* δ(q,q) = *TR and* ϕ<sup>q</sup> = *Inv is an inductive phase automaton w.r.t. TS.*

In this sense, phase inductive invariants are as expressive as inductive invariants. However, as we show in this paper, their structure can be used by a user as an intuitive way to guide an automatic invariant inference algorithm.

*Safe Inductive Phase Invariants.* Next we show that an inductive phase invariant can be used to establish safety.

**Definition 5 (Safe Phase Automaton).** *Let* A *be a phase automaton over* Σ *with quantifiers* V*. Then* A *is* safe *w.r.t.* ∀V. P *if* ∀V. (ϕ<sup>q</sup> =⇒ P) *holds for every* q ∈ Q*.*

**Lemma 3.** *If* A *is inductive w.r.t. TS and safe w.r.t.* ∀V. P *then* ∀V.P *is an invariant of TS.*

#### **5 Inference of Inductive Phase Invariants**

In this section we turn to the *inference* of safe inductive phase invariants over a given phase structure, which guides the search. Formally, the problem we target is:

**Definition 6 (Inductive Phase Invariant Inference).** *Given a transition system TS* = (*Init*, *TR*)*, a phase structure* S = (Q, ι, V, δ) *and a safety property* ∀V. P*, all over* Σ*,* find *a safe inductive phase invariant* A *for TS over the phase structure* S*, if one exists.*

*Example 4.* Inference of an inductive phase invariant is provided with the phase structure in Fig. 2, which embodies an intuitive understanding of the different phases the protocol undergoes (see Example 1). The algorithm automatically finds phase characterizations forming a safe inductive phase invariant over the user-provided structure. We note that inference is valuable even after a phase structure is provided: in the running example, finding an inductive phase invariant is not easy; in particular, the characterizations in Fig. 2 relate different parts of the state and involve multiple quantifiers.

#### **5.1 Reduction to Constrained Horn Clauses**

We view each unknown phase characterization, ϕq, which we aim to infer for every q ∈ Q, as a predicate Iq. The definition of a safe inductive phase invariant induces a set of second-order Constrained Horn Clauses (CHC) over Iq:

$$\text{Initialation.}\tag{1}$$

$$\text{Init} \implies \left(\forall \mathcal{V}. \, I\_{\iota}\right)\tag{l}$$

**Inductiveness.** For every (q, p) ∈ R : ∀V. - I<sup>q</sup> ∧ δ(q,p) =⇒ I p (2)

**Edge Covering.** For every q ∈ Q : ∀V. <sup>I</sup><sup>q</sup> <sup>∧</sup> *TR* <sup>=</sup><sup>⇒</sup> (q,p)∈R δ(q,p) (3)

$$\text{Safety.}\text{ For every } q \in \mathcal{Q}:\tag{4} \\ \qquad \qquad \qquad \forall \mathcal{V}.\ (I\_q \Longrightarrow \mathcal{P}) \tag{4}$$

where V denotes the quantifiers of A. All the constraints are *linear*, namely at most one unknown predicate appears at the lefthand side of each implication.

Constraint (4) captures the original safety requirement, whereas (3) can be understood as additional safety properties that are specified by the phase automaton (since no unknown predicates appear in the righthand side of the implications).

A *solution* **I** to the CHC system associates each predicate I<sup>q</sup> with a formula ψ<sup>q</sup> over Σ (with *FV*(ψq) ⊆ V) such that when ψ<sup>q</sup> is substituted for Iq, all the constraints are satisfied (i.e., the corresponding first-order formulas are valid). A solution to the system induces a safe inductive phase automaton through characterizing each phase q by the interpretation of Iq, and vice versa. Formally:

**Lemma 4.** *Let* A = (Q, R, ι, V, δ,ϕ) *with* ϕ<sup>q</sup> = **I**q*. Then* A *is a safe inductive phase invariant wrt. TS and* ∀V. P *if and only if* **I** *is a solution to the CHC system.*

Therefore, to infer a safe inductive phase invariant over a given phase structure, we need to solve the corresponding CHC system. In Sect. 6.1 we explain our approach for doing so for the class of universally quantified phase characterizations. Note that the weaker definition of inductiveness discussed in Remark 1 would prevent the reduction to CHC as it would result in clauses that are *not* Horn clauses.

*Completeness of Inductive Phase Invariants.* There are cases where a given phase structure induces a safe phase invariant A, but not an inductive one, making the CHC system unsatisfiable. However, a strengthening into an inductive phase invariant can always be used to prove that A is an invariant if (i) the language of invariants is unrestricted, and (ii) the phase structure is deterministic, namely, does not cover the same transition in two outgoing edges. Determinism of the automaton does not lose generality in the context of safety verification since every inductive phase automaton can be converted to a deterministic one; non-determinism is in fact unbeneficial as it mandates the same state to be characterized by multiple phases (see also Remark 1). These topics are discussed in detail in the extended version [24].

*Remark 2.* Each phase is associated with a set of states that can reach it, where a state σ can reach phase q if there is a sequence of program transitions that results in σ and can lead to q according to the automaton's transitions. This makes a phase structure different from a simple syntactical disjunctive template for inference, in which such semantic meaning is unavailable.

#### **5.2 Phase Structures as a Means to Guide Inference**

The search space of invariants over a phase structure is in fact *larger* than that of standard inductive invariants, because each phase can be associated with different characterizations. Sometimes the disjunctive structure of the phases (Lemma 2) uncovers a significantly simpler invariant than exists in the syntactical class of standard inductive invariants explored by the algorithm, but this is not always the case.<sup>1</sup> Nonetheless, the search for an invariant over the structure is *guided*, through the following aspects:

(1) *Phase decomposition.* Inference of an inductive phase invariant aims to find characterizations that overapproximate the set of states reachable in each phase (Remark 2). The distinction between phases is most beneficial when there is a considerable *difference* between the sets associated with different phases and their characterizations. For instance, in the running example, all states without unreceived transfer messages are associated with O[k], whereas all states in which such messages exist are associated with T[k]—a distinction captured by the characterizations in lines 75 and 82 in Fig. 2.

<sup>1</sup> As an illustration, the extended version [24] includes an inductive invariant for the running example which is comparable in complexity to the inductive phase invariant in Fig. 2.

Differences between phases would have two consequences. First, since each phase corresponds to fewer states than all reachable states, generalization—the key ingredient in inference procedures—is more focused. The second consequence stems from the fact that inductive characterizations of different phases are correlated. It is expected that a certain property is more readily learnable in one phase, while related facts in other phases are more complex. For instance, the characterization in line 75 in Fig. 2 is more straightforward than the one in line 82. Simpler facts in one phase can help characterize an adjacent phase when the algorithm analyzes how that property evolves along the edge. Thus utilizing the phase structure can improve the gradual construction of overapproximations of the sets of states reachable in each phase.


## **6 Implementation and Evaluation**

In this section we apply invariant inference guided by phase structures to distributed protocols modeled in EPR, motivated by previous deductive approaches [50,51,60].

#### **6.1 Phase-PDR***<sup>∀</sup>* **for Inferring Universally Quantified Characterizations**

We now describe our procedure for solving the CHCs system of Sect. 5.1. It either (i) returns universally quantified phase characterizations that induce a safe inductive phase invariant, (ii) returns an abstract counterexample trace demonstrating that this is not possible, or (iii) diverges.

*EPR.* Our procedure handles transition systems expressed using the extended **E**ffectively **PR**opositional fragment (EPR) of first order logic [51,52], and infers universally quantified phase characterizations. Satisfiability of (extended) EPR formulas is decidable, enjoys the finite-model property, and supported by solvers such as Z3 [46] and iProver [41].

*Phase-PDR*∀*.* Our procedure is based on PDR<sup>∀</sup> [40], a variant of PDR [10,21] that infers universally quantified inductive invariants. PDR computes a sequence of *frames* F0,..., F<sup>n</sup> such that F<sup>i</sup> overapproximates the set of states reachable in i steps. In our case, each frame F<sup>i</sup> is a mapping from a phase q to characterizations. The details of the algorithm are standard for PDR; we describe the gist of the procedure in the extended version [24]. We only stress the following: Counterexamples to safety take into account the safety property as well as disabled transitions. Search for predecessors is performed by going backwards on automaton edges, blocking counterexamples from preceding phases to prove an obligation in the current phase. Generalization is performed w.r.t. all incoming edges. As in PDR∀, proof obligations are constructed via diagrams [12]; in our setting these include the interpretation for the view quantifiers (see [24] for details).

*Edge Covering Check in EPR.* In our setting, Eqs. (1), (2) and (4) fall in EPR, but not Eq. (3). Thus, we restrict edge labeling so that each edge is labeled with a *TR* of an action, together with an alternation-free precondition. It then suffices to check implications between the preconditions and the entire *TR* (see the extended version [24]). Such edge labeling is sufficiently expressive for all our examples. Alternatively, sound but incomplete bounded quantifier instantiation [23] could be used, potentially allowing more complex decompositions of *TR*.

*Absence of Inductive Phase Characterizations.* What happens when the user gets the automaton wrong? One case is when there does not exist an inductive phase invariant with universal phase characterizations over the given structure. When this occurs, our tool can return an *abstract counterexample trace*—a sequence of program transitions and transitions of the automaton (inspired by [40,49])—which constitutes a proof of that fact (see the extended version [24]). The counterexample trace can assist the user in debugging the automaton or the program and modifying them. For instance, missing edges occurred frequently when we wrote the automata of Sect. 6, and we used the generated counterexample traces to correct them.

Another type of failure is when an inductive phase invariant exists but the automaton does not direct the search well towards it. In this case the user may decide to terminate the analysis and articulate a different intuition via a different phase structure. In standard inference procedures, the only way to affect the search is by modifying the transition system; instead, phase structures equip the user with an ability to guide the search.

#### **6.2 Evaluation**

We evaluate our approach for user-guided invariant inference by comparing Phase-PDR<sup>∀</sup> to standard PDR∀. We implemented PDR<sup>∀</sup> and Phase-PDR<sup>∀</sup> in MYPYVY [2], a new system for invariant inference inspired by Ivy [45], over Z3 [46]. We study:


*Protocols.* We applied PDR<sup>∀</sup> and Phase-PDR<sup>∀</sup> to the most challenging examples admitting universally-quantified invariants, which previous works verified using deductive techniques. The protocols we analyzed are listed below and in Table 1. The full models appear in [1]. The KV-R protocol analyzed is taken from one of the two realistic systems studied by the IronFleet paper [33] using deductive verification.

*Phase Structures.* The phase structures we used appear in [1]. In all our examples, it was straightforward to translate the existing high-level intuition of important and relevant distinctions between phases in the protocol into the phase structures we report. For example, it took us less than an hour to finalize an automaton for KV-R. We emphasize that phase structures do not include phase characterizations; the user need not supply them, nor has to understand the inference procedure. Our exposition of the phase structures below refers to an intuitive meaning of each phase, but this is not part of the phase structure provided to the tool.

**Table 1.** Running times in seconds of PDR<sup>∀</sup> and Phase-PDR∀, presented as the mean and standard deviation (in parentheses) over 16 different Z3 random seeds. "∗" indicates that some runs did not converge after 1 h and were not included in the summary statistics. "> 1 h" means that no runs of the algorithm converged in 1 h. #p refers to the number of phases and #v to the number of view quantifiers in the phase structure. #r refers to the number of relations and |a| to the maximal arity. The remaining columns describe the inductive invariant/phase invariant obtained in inference. |f| is the maximal frame reached. #c, #q are the mean number of clauses and quantifiers (excluding view quantifiers) per phase, ranging across the different phases.


**(1) Achieving Convergence Through Phases.** In this section we consider the effect of phases on inference for examples on which standard PDR<sup>∀</sup> does not converge in 1 hr. *Examples. Sharded key-value store with retransmissions (KV-R)*: see Sect. 3 and Example 1. This protocol has not been modeled in decidable logic before.

*Cache Coherence.* This example implements the classic MESI protocol for maintaining cache coherence in a shared-memory multiprocessor [36], modeled in decidable logic for the first time. Cores perform reads and writes to memory, and caches snoop on each other's requests using a shared bus and maintain the invariant that there is at most one writer of a particular cache line. For simplicity, we consider only a single cache line, and yet the example is still challenging for PDR∀. Standard explanations of this protocol in the literature already use automata to describe this invariant, and we directly exploit this structure in our phase automaton. *Phase Structure:* There are 10 phases in total, grouped into three parts corresponding to the modified, exclusive, and shared states in the classical description. Within each group, there are additional phases for when a request is being processed by the bus. For example, in the shared group, there are phases for handling reads by cores without a copy of the cache line, writes by such cores, and also writes by cores that *do* have a copy. Overall, the phase structure is directly derived from textbook descriptions, taking into account that use of the shared bus is not atomic. *Results and Discussion.* Measurements for these examples appear in Table 1. Standard PDR<sup>∀</sup> fails to converge in less than an hour on 13 out of 16 seeds for KV-R and all 16 seeds for the cache. In contrast, Phase-PDR<sup>∀</sup> converges to a proof in a few minutes in all cases. These results demonstrate that phase structures can effectively guide the search and obtain an invariant quickly where standard inductive invariant inference does not.

**(2) Enhancing Performance Through Phases.** In this section we consider the use of phase structures to improve the speed of convergence to a proof.

*Examples. Distributed lock service,* adapted from [61], allows clients to acquire and release locks by sending requests to a central server, which guarantees that only one client holds each lock at a time. *Phase structure*: for each lock, the phases follow the 4 steps by which a client completes a cycle of acquire and release. We also consider a simpler variant with only a single lock, reducing the arity of all relations and removing the need for an automaton view. Its *phase structure* is the same, only for a single lock.

*Simple quorum-based consensus*, based on the example in [60]. In this protocol, nodes propose themselves and then receive votes from other nodes. When a quorum of votes for a node is obtained, it becomes the leader and decides on a value. Safety requires that decided values are unique. The *phase structure* distinguishes between the phases before any node is elected leader, once a node is elected, and when values are decided. Note that the automaton structure is unquantified.

*Leader election in a ring* [13,51], in which nodes are organized in a directional ring topology with unique IDs, and the safety property is that an elected leader is a node with the highest ID. *Phase structure*: for a view of two nodes n1, n2, in the first phase, messages with the ID of n<sup>1</sup> are yet to advance in the ring past n2, while in the second phase, a message advertising n<sup>1</sup> has advanced past n2. The inferred characterizations include another quantifier on nodes, constraining interference (see Sect. 7).

*Sharded key-value store (KV)* is a simplified version of KV-R above, without message drops and the retransmission mechanism. The *phase structure* is exactly as in KV-R, omitting transitions related to sequence numbers and acknowledgment. This protocol has not been modeled in decidable logic before.

*Results and Discussion.* We compare the performance of standard PDR<sup>∀</sup> and Phase-PDR<sup>∀</sup> on the above examples, with results shown in Table 1. For each example, we ran the two algorithms on 16 different Z3 random seeds. Measurements were performed on a 3.4GHz AMD Ryzen Threadripper 1950X with 16 physical cores, running Linux 4.15.0, using Z3 version 4.7.1. By disabling hyperthreading and frequency scaling and pinning tasks to dedicated cores, variability across runs of a single seed was negligible.

In all but one example, Phase-PDR<sup>∀</sup> improves performance, sometimes drastically; for example, performance for leader election in a ring is improved by a factor of 60. Phase-PDR<sup>∀</sup> also improves the *robustness* of inference [27] on this example, as the standard deviation falls from 39 in PDR<sup>∀</sup> to 0.04 in Phase-PDR∀.

The only example in which a phase structure actually diminishes inference effectiveness is simple consensus. We attribute this to an automaton structure that does not capture the essence of the correctness argument very well, overlooking votes and quorums. This demonstrates that a phase structure might guide the search towards counterproductive directions if the user guidance is "misleading". This suggests that better resiliency of interactive inference framework could be achieved by combining phasebased inference with standard inductive invariant-based reasoning. We are not aware of a single "good" automaton for this example. The correctness argument of this example is better captured by the conjunction of two automata (one for votes and one for accumulating a quorum) with different views, but the problem of inferring phase invariants for mutually-dependent automata is a subject for future work.

**(3) Anatomy of the Benefit of Phases.** We now demonstrate that each of the beneficial aspects of phases discussed in Sect. 5.2 is important for the benefits reported above.

*Phase Decomposition.* Is there a benefit from a phase structure even without disabled transitions? An example to a positive answer to this question is leader election in a ring, which demonstrates a huge performance benefit even without disabled transitions.

*Disabled Transitions.* Is there a substantial gain from exploiting disabled transitions? We compare Phase-PDR<sup>∀</sup> on the structure with disabled transitions and a structure obtained by (artificially) adding self loops labeled with the originally impossible transitions, on the example of lock service with multiple locks (Sect. 6.2), seeing that it demonstrates a performance benefit using Phase-PDR<sup>∀</sup> and showcases several disabled transitions in each phase. The result is that without disabled transitions, the mean running time of Phase-PDR<sup>∀</sup> on this example jumps from 2.73 s to 6.24 s. This demonstrates the utility of the additional safety properties encompassed in disabled transitions.

*Phase-Awareness.* Is it important to treat phases explicitly in the inference algorithm, as we do in Phase-PDR<sup>∀</sup> (Sect. 6.1)? We compare our result on convergence of KV-R with an alternative in which standard PDR<sup>∀</sup> is applied to an encoding of the phase decomposition and disabled transition by *ghost state*: each phase is modeled by a relation over possible view assignments, and the model is augmented with update code mimicking phase changes; the additional safety properties derived from disabled transitions are provided; and the view and the appropriate modification of the safety property are introduced. This translation expresses all information present in the phase structure, but does not explicitly guide the inference algorithm to use this information. The result is that with this ghost-based modeling the phase-oblivious PDR<sup>∀</sup> does not converge in 1 h on KV-R in any of the 16 runs, whereas it converges when Phase-PDR<sup>∀</sup> explicitly directs the search using the phase structure.

#### **7 Related Work**

*Phases in Distributed Protocols.* Distributed protocols are frequently described in informal descriptions as transitioning between different phases. Recently, PSync [19] used the Heard-Of model [14], which describes protocols as operating in rounds, as a basis for the implementation and verification of fault-tolerant distributed protocols. Typestates [e.g.] [25,59] also bear some similarity to the temporal aspect of phases. State machine refinement [3,28] is used extensively in the design and verification of distributed systems (see e.g. [33,47]). The automaton structure of a phase invariant is also a form of state machine; our focus is on inference of characterizations establishing this.

*Interaction in Verification.* Interactive proof assistants such as Coq [8] and Isabelle/HOL [48] interact with users to aid them as they attempt to prove candidate inductive invariants. This differs from interaction through phase structures and counterexample traces. Ivy uses interaction for invariant inference by interactive generalization from counterexamples [51]. This approach is less automatic as it requires interaction for every clause of the inductive invariant. In terminology from synthesis [30], the use of counterexamples is *synthesizer-driven* interaction with the tool, while interaction via phase structures is mainly *user-driven*. Abstract counterexample traces returned by the tool augment this kind of interaction. As [38] has shown, interactive invariant inference, when considered as a synthesis problem (see also [27,55]) is related to inductive learning.

*Template-Based Invariant Inference.* Many works employ syntactical templates for invariants, used to constrain the search [e.g.] [7,16,54,57,58]. The different phases in a phase structure induce a disjunctive form, but crucially each disjunct also has a distinct semantic meaning, which inference overapproximates, as explained in Sect. 5.2.

*Automata in Safety Verification.* Safety verification through an automaton-like refinement of the program's control has been studied in a number of works. We focus on related techniques for proof automation. The *Automizer* approach to the verification of sequential programs [34,35] is founded on the notion of a *Floyd-Hoare automaton*, which is an unquantified inductive phase automaton; an extension to parallel programs [22] uses thread identifiers closed under the symmetry rule, which are related to view quantifiers. Their focus is on the automatic, incremental construction of such automata as a union of simpler automata, where each automaton is obtained from generalizing the proof/infeasibility of a single trace. In our approach the structure of the automaton is provided by the user as a means of conveying their intuition of the proof, while the annotations are computed automatically. A notable difference is that in Automizer, the generation of characterizations in an automaton constructed from a single trace does not utilize the phase structure (beyond that of the trace), whereas in our approach the phase structure is central in generalization from states to characterizations. In *trace partitioning* [44,53], abstract domains based on transition systems partitioning the program's control are introduced. The observation is that recording historical information forms a basis for case-splitting, as an alternative to fully-disjunctive abstractions. This differs from our motivation of distinguishing between different protocol phases. The phase structure of the domain is determined by the analyser, and can also be dynamic. In our work the phase structure is provided by the user as guidance. We use a variant of PDR∀, rather than abstract interpretation [17], to compute universally quantified phase characterizations. Techniques such as *predicate abstraction* [26,29] and *existential abstraction* [15], as well as the safety part of *predicate diagrams* [11], use finite languages for the set of possible characterizations and lack the notion of views, both essential for handling unbounded numbers of processes and resources. Finally, *phase splitter predicates* [56] share our motivation of simplifying invariant inference by exposing the different phases the loop undergoes. Splitter predicates correspond to inductive phase characterizations [56, Theorem 1], and are automatically constructed according to program conditionals. In our approach, decomposition is performed by the user using potentially non-inductive conditions, and the inductive phase characterizations are computed by invariant inference. Successive loop splitting results in a sequence of phases, whereas our approach utilizes arbitrary automaton structures. Borralleras et al. [9] also refine the control-flow graph throughout the analysis by splitting on conditions, which are discovered as preconditions for termination (the motivation is to expose termination proof goals to be established): in a sense, the phase structure is grown from candidate characterizations implying termination. This differs from our approach in which the phase structure is used to guide the inference of characterizations.

*Quantified Invariant Inference.* We focus here on the works on quantifiers in automatic verification most closely related to our work. In *predicate abstraction*, quantifiers can be used internally as part of the definitions of predicates, and also externally through predicates with free variables [26,42]. Our work uses quantifiers both internally in phases characterizations and externally in view quantifiers. The view is also related to the bounded number of quantifiers used in *view abstraction* [5,6]. In this work we observe that it is useful to consider views of entities beyond processes or threads, such as a single key in the store. Quantifiers are often used to their full extent in verification conditions, namely checking implication between two quantified formulas, but they are sometimes employed in weaker checks as part of thread-modular proofs [4,39]. This amounts to searching for invariants provable using specific instantiations of the quantifiers in the verification conditions [31,37]. In our verification conditions, the view quantifiers are localized, in effect performing a single instantiation. This is essential for exploiting the disjunctive structure under the quantifiers, allowing inference to consider a single automaton edge in each step, and reflecting an intuition of correctness. When necessary to constrain interference, quantifiers in phase characterizations can be used to establish necessary facts about interfering views. Finally, there exist algorithms other than PDR<sup>∀</sup> for solving CHC by predicates with universal invariants [e.g. 20,32].

#### **8 Conclusion**

Invariant inference techniques aiming to verify intricate distributed protocols must adjust to the diverse correctness arguments on which protocols are based. In this paper we have proposed to use phase structures as means of conveying users' intuition of the proof, to be used by an automatic inference tool as a basis for a full formal proof. We found that inference guided by a phase structure can infer proofs for distributed protocols that are beyond reach for state of the art inductive invariant inference methods, and can also improve the speed of convergence. The phase decomposition induced by the automaton, the use of disabled transitions, and the explicit treatment of phases in inference, all combine to direct the search for the invariant. We are encouraged by our experience of specifying phase structures for different protocols. It would be interesting to integrate the interaction via phase structures with other verification methods and proof logics, as well as interaction schemes based on different, complementary, concepts. Another important direction for future work is inference beyond universal invariants, required for example for the proof of Paxos [50].

**Acknowledgements.** We thank Kalev Alpernas, Javier Esparza, Neil Immerman, Shachar Itzhaky, Oded Padon, Andreas Podelski, Tom Reps, and the anonymous referees for insightful comments which improved this paper. This publication is part of a project that has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. [759102-SVIS]). The research was partially supported by Len Blavatnik and the Blavatnik Family foundation, the Blavatnik Interdisciplinary Cyber Research Center, Tel Aviv University, the Israel Science Foundation (ISF) under grant No. 1810/18, the United States-Israel Binational Science Foundation (BSF) grant No. 2016260, and the National Science Foundation under Grant No. 1749570. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Termination of Triangular Integer Loops is Decidable**

Florian Frohn<sup>1</sup> and J¨urgen Giesl2(B)

<sup>1</sup> Max Planck Institute for Informatics, Saarbr¨ucken, Germany florian.frohn@mpi-inf.mpg.de <sup>2</sup> LuFG Informatik 2, RWTH Aachen University, Aachen, Germany giesl@informatik.rwth-aachen.de

**Abstract.** We consider the problem whether termination of affine integer loops is decidable. Since Tiwari conjectured decidability in 2004 [15], only special cases have been solved [3,4,14]. We complement this work by proving decidability for the case that the update matrix is triangular.

#### **1 Introduction**

We consider affine integer loops of the form

$$\text{while } \varphi \text{ do } \overline{x} \leftarrow A\overline{x} + \overline{a}. \tag{1}$$

Here, <sup>A</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>×<sup>d</sup> for some dimension <sup>d</sup> <sup>≥</sup> 1, <sup>x</sup> is a column vector of pairwise different variables <sup>x</sup>1,...,xd, <sup>a</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>, and <sup>ϕ</sup> is a conjunction of inequalities of the form α > 0 where <sup>α</sup> <sup>∈</sup> <sup>A</sup>f[x] is an affine expression with rational coefficients<sup>1</sup> over <sup>x</sup> (i.e., <sup>A</sup>f[x] = {c<sup>T</sup> <sup>x</sup> <sup>+</sup> <sup>c</sup> <sup>|</sup> <sup>c</sup> <sup>∈</sup> <sup>Q</sup><sup>d</sup>, c <sup>∈</sup> <sup>Q</sup>}). So <sup>ϕ</sup> has the form <sup>B</sup> <sup>x</sup> <sup>+</sup> b > <sup>0</sup> where 0 is the vector containing <sup>k</sup> zeros, <sup>B</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>d</sup>, and <sup>b</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup> for some <sup>k</sup> <sup>∈</sup> <sup>N</sup>. Definition 1 formalizes the intuitive notion of termination for such loops.

**Definition 1 (Termination).** *Let* <sup>f</sup> : <sup>Z</sup><sup>d</sup> <sup>→</sup> <sup>Z</sup><sup>d</sup> *with* <sup>f</sup>(x) = <sup>A</sup> <sup>x</sup> <sup>+</sup> <sup>a</sup>*. If*

$$
\exists \overline{c} \in \mathbb{Z}^d. \,\forall n \in \mathbb{N}. \,\varphi[\overline{x}/f^n(\overline{c})],
$$

*then (1) is* non-terminating *and* c *is a* witness *for non-termination. Otherwise, (1)* terminates*.*

Here, f <sup>n</sup> denotes the n-fold application of f, i.e., we have f <sup>0</sup>(c) = c and f <sup>n</sup>+1(c) = f(f <sup>n</sup>(c)). We call f the *update* of (1). Moreover, for any entity s, s[x/t] denotes the entity that results from s by replacing all occurrences of x by t. Sim-

ilarly, if x = ⎡ ⎣ x1 . . . x<sup>m</sup> ⎤ <sup>⎦</sup> and <sup>t</sup> <sup>=</sup> ⎡ ⎣ t1 . . . t<sup>m</sup> ⎤ <sup>⎦</sup>, then <sup>s</sup>[x/t] denotes the entity resulting from <sup>s</sup>

by replacing all occurrences of x<sup>i</sup> by t<sup>i</sup> for each 1 ≤ i ≤ m.

<sup>1</sup> Note that multiplying with the least common multiple of all denominators yields an equivalent constraint with integer coefficients, i.e., allowing rational instead of integer coefficients does not extend the considered class of loops.

Funded by DFG grant 389792660 as part of TRR 248 and by DFG grant GI 274/6.

**Example 2.** *Consider the loop*

$$\text{while } y + z > 0 \text{ do}\\
\begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} \leftarrow \begin{bmatrix} 2 \\ x + 1 \\ -w - 2 \cdot y \\ x \end{bmatrix}$$

*where the update of all variables is executed simultaneously. This program belongs to our class of affine loops, because it can be written equivalently as follows.*

$$\text{while } y+z>0 \text{ do } \begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} \leftarrow \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ -1 & 0-2 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} + \begin{bmatrix} 2 \\ 1 \\ 0 \\ 0 \end{bmatrix}$$

While termination of affine loops is known to be decidable if the variables range over the real [15] or the rational numbers [4], the integer case is a wellknown open problem [2–4,14,15].<sup>2</sup> However, certain special cases have been solved: Braverman [4] showed that termination of *linear* loops is decidable (i.e., loops of the form (1) where a is 0 and ϕ is of the form B x > 0). Bozga et al. [3] showed decidability for the case that the update matrix A in (1) has the *finite monoid property*, i.e., if there is an n > 0 such that A<sup>n</sup> is diagonalizable and all eigenvalues of <sup>A</sup><sup>n</sup> are in {0, <sup>1</sup>}. Ouaknine et al. [14] proved decidability for the case d ≤ 4 and for the case that A is diagonalizable.

Ben-Amram et al. [2] showed undecidability of termination for certain extensions of affine integer loops, e.g., for loops where the body is of the form **if** x > <sup>0</sup> **then** <sup>x</sup> <sup>←</sup> <sup>A</sup> <sup>x</sup> **else** <sup>x</sup> <sup>←</sup> <sup>A</sup> <sup>x</sup> where A, A <sup>∈</sup> <sup>Z</sup><sup>d</sup>×<sup>d</sup> and <sup>x</sup> <sup>∈</sup> <sup>x</sup>.

In this paper, we present another substantial step towards the solution of the open problem whether termination of affine integer loops is decidable. We show that termination is decidable for *triangular* loops (1) where A is a triangular matrix (i.e., all entries of A below or above the main diagonal are zero). Clearly, the order of the variables is irrelevant, i.e., our results also cover the case that A can be transformed into a triangular matrix by reordering A, x, and a accordingly.<sup>3</sup> So essentially, triangularity means that the program variables x1,...,x<sup>d</sup> can be ordered such that in each loop iteration, the new value of x<sup>i</sup> only depends on the previous values of x1,...,x<sup>i</sup>−<sup>1</sup>, xi. Hence, this excludes programs with "cyclic dependencies" of variables (e.g., where the new values of x and y both depend on the old values of both x and y). While triangular loops are a very restricted subclass of general integer programs, integer programs often contain such loops. Hence, tools for termination analysis of such programs (e.g., [5–8,11–13]) could

<sup>2</sup> The proofs for real or rational numbers do not carry over to the integers since [15] uses Brouwer's Fixed Point Theorem which is not applicable if the variables range over Z and [4] relies on the density of Q in R.

<sup>3</sup> Similarly, one could of course also use other termination-preserving pre-processings and try to transform a given program into a triangular loop.

benefit from integrating our decision procedure and applying it whenever a subprogram is an affine triangular loop.

Note that triangularity and diagonalizability of matrices do not imply each other. As we consider loops with arbitrary dimension, this means that the class of loops considered in this paper is not covered by [3,14]. Since we consider affine instead of linear loops, it is also orthogonal to [4].

To see the difference between our and previous results, note that a triangular matrix A where c1,...,c<sup>k</sup> are the *distinct* entries on the diagonal is diagonalizable iff (A−c1I)...(A−ckI) is the zero matrix.<sup>4</sup> Here, <sup>I</sup> is the identity matrix. So an easy example for a triangular loop where the update matrix is not diagonalizable is the following well-known program (see, e.g., [2]):

$$\text{which } x > 0 \text{ do } x \gets x + y; \text{ } y \gets y - 1.$$

It terminates as y eventually becomes negative and then x decreases in each iteration. In matrix notation, the loop body is x y ← 1 1 0 1 <sup>x</sup> y + 0 −1 , i.e., the update matrix is triangular. Thus, this program is in our class of programs where we show that termination is decidable. However, the only entry on the diagonal of the update matrix A is c = 1 and A − c I = 0 1 0 0 is not the zero matrix. So <sup>A</sup> (and in fact each <sup>A</sup><sup>n</sup> where <sup>n</sup> <sup>∈</sup> <sup>N</sup>) is not diagonalizable. Hence, extensions of this example to a dimension greater than 4 where the loop is still triangular are not covered by any of the previous results.<sup>5</sup>

Our proof that termination is decidable for triangular loops proceeds in three steps. We first prove that termination of triangular loops is decidable iff termination of *non-negative triangular* loops (*nnt-loops*) is decidable, cf. Sect. 2. A loop is non-negative if the diagonal of A does not contain negative entries. Second, we show how to compute *closed forms* for nnt-loops, i.e., vectors q of d expressions over the variables <sup>x</sup> and <sup>n</sup> such that <sup>q</sup>[n/c] = <sup>f</sup> <sup>c</sup>(x) for all <sup>c</sup> <sup>≥</sup> 0, see Sect. 3. Here, triangularity of the matrix A allows us to treat the variables step by step. So for any 1 ≤ i ≤ d, we already know the closed forms for x1,...,x<sup>i</sup>−<sup>1</sup> when computing the closed form for xi. The idea of computing closed forms for the repeated updates of loops was inspired by our previous work on inferring lower bounds on the runtime of integer programs [10]. But in contrast to [10], here the computation of the closed form always succeeds due to the restricted shape of the programs. Finally, we explain how to decide termination of nnt-loops by reasoning about their closed forms in Sect. 4. While our technique does not yield witnesses for non-termination, we show that it yields witnesses for *eventual* nontermination, i.e., vectors c such that f <sup>n</sup>(c) witnesses non-termination for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Detailed proofs for all lemmas and theorems can be found in [9].

<sup>4</sup> The reason is that in this case, (<sup>x</sup> <sup>−</sup> <sup>c</sup>1) ... (<sup>x</sup> <sup>−</sup> <sup>c</sup>k) is the minimal polynomial of A and diagonalizability is equivalent to the fact that the minimal polynomial is a product of distinct linear factors.

<sup>5</sup> For instance, consider **while** x > <sup>0</sup> **do** <sup>x</sup> <sup>←</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>+</sup> <sup>z</sup><sup>1</sup> <sup>+</sup> <sup>z</sup><sup>2</sup> <sup>+</sup> <sup>z</sup>3; <sup>y</sup> <sup>←</sup> <sup>y</sup> <sup>−</sup> 1.

#### **2 From Triangular to Non-Negative Triangular Loops**

To transform triangular loops into nnt-loops, we define how to *chain* loops. Intuitively, chaining yields a new loop where a single iteration is equivalent to two iterations of the original loop. Then we show that chaining a triangular loop always yields an nnt-loop and that chaining is equivalent w.r.t. termination.

**Definition 3 (Chaining).** Chaining *the loop (1) yields:*

$$\text{while } \varphi \land \varphi[\overline{x}/A\overline{x} + \overline{a}] \text{ do } \overline{x} \gets A^2 \overline{x} + A\overline{a} + \overline{a} \tag{2}$$

**Example 4.** *Chaining Example 2 yields*

$$\begin{array}{c} \textbf{while} \ y+z > 0 \land -w-2 \cdot y+x > 0 \ \textbf{do} \\ \begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} \leftarrow \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ -1 & 0-2 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}^{2} \begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ -1 & 0-2 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 1 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 2 \\ 1 \\ 0 \\ 0 \end{bmatrix} \end{array}$$

*which simplifies to the following nnt-loop:*

$$\text{while } y + z > 0 \land -w - 2 \cdot y + x > 0 \text{ do} \begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} \leftarrow \begin{bmatrix} 0 \ 0 \ 0 \ 0 \\ 0 \ 1 \ 0 \ 0 \\ 2 \ 0 \ 4 \ 0 \\ 0 \ 1 \ 0 \ 0 \end{bmatrix} \begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} + \begin{bmatrix} 2 \\ 2 \\ -2 \\ 1 \end{bmatrix}$$

Lemma 5 is needed to prove that (2) is an nnt-loop if (1) is triangular.

**Lemma 5 (Squares of Triangular Matrices).** *For every triangular matrix* A*,* A<sup>2</sup> *is a triangular matrix whose diagonal entries are non-negative.*

**Corollary 6 (Chaining Loops).** *If (1) is triangular, then (2) is an nnt-loop.*

*Proof.* Immediate consequence of Definition 3 and Lemma 5. 

**Lemma 7 (Equivalence of Chaining).** *(1) terminates* ⇐⇒ *(2) terminates.*

*Proof.* By Definition 1, (1) does not terminate iff

<sup>∃</sup><sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>. <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>. ϕ[x/f <sup>n</sup>(c)] ⇐⇒ ∃<sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>. <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>. ϕ[x/f <sup>2</sup>·<sup>n</sup>(c)] <sup>∧</sup> <sup>ϕ</sup>[x/f <sup>2</sup>·n+1(c)] ⇐⇒ ∃<sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>. <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>. ϕ[x/f <sup>2</sup>·<sup>n</sup>(c)] <sup>∧</sup> <sup>ϕ</sup>[x/A f <sup>2</sup>·<sup>n</sup>(c) + <sup>a</sup>] (by Definition of <sup>f</sup>),

i.e., iff (2) does not terminate as <sup>f</sup> <sup>2</sup>(x) = <sup>A</sup><sup>2</sup> <sup>x</sup> <sup>+</sup> <sup>A</sup> <sup>a</sup> <sup>+</sup> <sup>a</sup> is the update of (2). 

**Theorem 8 (Reducing Termination to nnt-Loops).** *Termination of triangular loops is decidable iff termination of nnt-loops is decidable.*

*Proof.* Immediate consequence of Corollary 6 and Lemma 7.

Thus, from now on we restrict our attention to nnt-loops.

#### **3 Computing Closed Forms**

The next step towards our decidability proof is to show that f <sup>n</sup>(x) is equivalent to a vector of *poly-exponential expressions* for each nnt-loop, i.e., the closed form of each nnt-loop can be represented by such expressions. Here, *equivalence* means that two expressions evaluate to the same result for all variable assignments.

Poly-exponential expressions are sums of arithmetic terms where it is always clear which addend determines the asymptotic growth of the whole expression when increasing a designated variable n. This is crucial for our decidability proof in Sect. 4. Let <sup>N</sup>≥<sup>1</sup> <sup>=</sup> {<sup>b</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>b</sup> <sup>≥</sup> <sup>1</sup>} (and <sup>Q</sup><sup>&</sup>gt;0, <sup>N</sup><sup>&</sup>gt;1, etc. are defined analogously). Moreover, Af[x] is again the set of all affine expressions over x.

**Definition 9 (Poly-Exponential Expressions).** *Let* C *be the set of all finite conjunctions over the literals* n = c, n = c *where* n *is a designated variable and* <sup>c</sup> <sup>∈</sup> <sup>N</sup>*. Moreover for each formula* <sup>ψ</sup> *over* <sup>n</sup>*, let* ψ *be the characteristic function of* ψ*, i.e.,* ψ (c)=1 *if* ψ[n/c] *is valid and* ψ (c)=0*, otherwise. The set of all* poly-exponential expressions *over* x *is*

$$\mathbb{PE}[\overline{x}] = \left\{ \sum\_{j=1}^{\ell} \lceil \psi\_j \rceil \cdot \alpha\_j \cdot n^{a\_j} \cdot b\_j^n \; \middle| \; \ell, a\_j \in \mathbb{N}, \; \psi\_j \in \mathcal{C}, \; \alpha\_j \in \mathbb{A} \mathbb{f}[\overline{x}], \; b\_j \in \mathbb{N}\_{\geq 1} \right\}.$$

As n ranges over N, we use n>c as syntactic sugar for - c <sup>i</sup>=0 n = i. So an example for a poly-exponential expression is

$$
\lceil n > 2 \rceil \cdot (2 \cdot x + 3 \cdot y - 1) \cdot n^3 \cdot 3^n \; + \; \left\lceil n = 2 \right\rceil \cdot (x - y) .
$$

Moreover, note that if ψ contains a *positive* literal (i.e., a literal of the form "<sup>n</sup> <sup>=</sup> <sup>c</sup>" for some number <sup>c</sup> <sup>∈</sup> <sup>N</sup>), then ψ is equivalent to either 0 or n = c.

The crux of the proof that poly-exponential expressions can represent closed forms is to show that certain sums over products of exponential and poly-exponential expressions can be represented by poly-exponential expressions, cf. Lemma 12. To construct these expressions, we use a variant of [1, Lemma 3.5]. As usual, Q[x] is the set of all polynomials over x with rational coefficients.

**Lemma 10 (Expressing Polynomials by Differences** [1]**).** *If* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>[n] *and* <sup>c</sup> <sup>∈</sup> <sup>Q</sup>*, then there is an* <sup>r</sup> <sup>∈</sup> <sup>Q</sup>[n] *such that* <sup>q</sup> <sup>=</sup> <sup>r</sup> <sup>−</sup> <sup>c</sup> · <sup>r</sup>[n/n <sup>−</sup> 1] *for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.*

So Lemma 10 expresses a polynomial q via the difference of another polynomial r at the positions n and n − 1, where the additional factor c can be chosen freely. The proof of Lemma 10 is by induction on the degree of q and its structure resembles the structure of the following algorithm to compute r. Using the Binomial Theorem, one can verify that q − s + c · s[n/n − 1] has a smaller degree than q, which is crucial for the proof of Lemma 10 and termination of Algorithm 1.

**Algorithm 1.** compute r **Input:** <sup>q</sup> <sup>=</sup> <sup>d</sup> <sup>i</sup>=0 <sup>c</sup><sup>i</sup> · <sup>n</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup>[n], c <sup>∈</sup> <sup>Q</sup> **Result:** <sup>r</sup> <sup>∈</sup> <sup>Q</sup>[n] such that <sup>q</sup> <sup>=</sup> <sup>r</sup> <sup>−</sup> <sup>c</sup> · <sup>r</sup>[n/n <sup>−</sup> 1] **if** d = 0 **then if** <sup>c</sup> = 1 **then return** <sup>c</sup><sup>0</sup> · <sup>n</sup> **else return** <sup>c</sup><sup>0</sup> <sup>1</sup>−<sup>c</sup> **else if** <sup>c</sup> = 1 **then** <sup>s</sup> <sup>←</sup> <sup>c</sup>d·nd+1 <sup>d</sup>+1 **else** <sup>s</sup> <sup>←</sup> <sup>c</sup>d·n<sup>d</sup> <sup>1</sup>−<sup>c</sup> **return** s + compute r( q − s + c · s[n/n − 1], c )

**Example 11.** *As an example, consider* q = 1 *(i.e.,* c<sup>0</sup> = 1*) and* c = 4*. Then we search for an* r *such that* q = r − c · r[n/n − 1]*, i.e.,* 1 = r − 4 · r[n/n − 1]*. According to Algorithm 1, the solution is* r = <sup>c</sup><sup>0</sup> <sup>1</sup>−<sup>c</sup> <sup>=</sup> <sup>−</sup><sup>1</sup> 3 *.*

**Lemma 12 (Closure of** PE **under Sums of Products and Exponentials).** *If* <sup>m</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>p</sup> <sup>∈</sup> PE[x]*, then one can compute a* <sup>q</sup> <sup>∈</sup> PE[x] *which is equivalent to* <sup>n</sup> <sup>i</sup>=1 <sup>m</sup><sup>n</sup>−<sup>i</sup> · <sup>p</sup>[n/i <sup>−</sup> 1]*.*

*Proof.* Let <sup>p</sup> <sup>=</sup> - <sup>j</sup>=1 <sup>ψ</sup><sup>j</sup> · <sup>α</sup><sup>j</sup> · <sup>n</sup><sup>a</sup><sup>j</sup> · <sup>b</sup><sup>n</sup> <sup>j</sup> . We have:

$$\sum\_{i=1}^{n} m^{n-i} \cdot p[n/i - 1] = \sum\_{j=1}^{\ell} \sum\_{i=1}^{n} \left[ \psi\_j \right] (i-1) \cdot m^{n-i} \cdot \alpha\_j \cdot (i-1)^{a\_j} \cdot b\_j^{i-1} \tag{3}$$

As PE[x] is closed under addition, it suffices to show that we can compute an equivalent poly-exponential expression for any expression of the form

$$\sum\_{i=1}^{n} \ \left[ \psi \right](i-1) \cdot m^{n-i} \cdot \alpha \cdot (i-1)^a \cdot b^{i-1}. \tag{4}$$

We first regard the case m = 0. Here, the expression (4) can be simplified to

$$\left\| \begin{bmatrix} n \neq 0 \end{bmatrix} \right\| \cdot \left\| \psi[n/n-1] \right\| \cdot \alpha \cdot (n-1)^a \cdot b^{n-1}.\tag{5}$$

Clearly, there is a ψ ∈ C such that ψ is equivalent to n = 0 · ψ[n/n − 1]. Moreover, <sup>α</sup>·b<sup>n</sup>−<sup>1</sup> <sup>=</sup> <sup>α</sup> <sup>b</sup> ·b<sup>n</sup> where <sup>α</sup> <sup>b</sup> <sup>∈</sup> <sup>A</sup>f[x]. Hence, due to the Binomial Theorem

n = 0 · <sup>ψ</sup>[n/n <sup>−</sup> 1] ·α·(n−1)<sup>a</sup> ·b<sup>n</sup>−<sup>1</sup> <sup>=</sup> <sup>a</sup> <sup>i</sup>=0 ψ · α b · a i ·(−1)<sup>i</sup> ·n<sup>a</sup>−<sup>i</sup> ·b<sup>n</sup> (6)

which is a poly-exponential expression as <sup>α</sup> <sup>b</sup> · a i · (−1)<sup>i</sup> <sup>∈</sup> <sup>A</sup>f[x].

From now on, let m ≥ 1. If ψ contains a positive literal n = c, then we get

$$\begin{array}{lcl} & \sum\_{i=1}^{n} \left[ \mathbb{U} \right] (i-1) \cdot m^{n-i} \cdot \alpha \cdot (i-1)^a \cdot b^{i-1} \\ = \sum\_{i=1}^{n} \left[ n > i-1 \right] \cdot \left[ \mathbb{U} \right] (i-1) \cdot m^{n-i} \cdot \alpha \cdot (i-1)^a \cdot b^{i-1} & (\dagger) \\ = \left[ n > c \right] \cdot \left[ \mathbb{U} \right] (c) \cdot m^{n-c-1} \cdot \alpha \cdot c^a \cdot b^c & (\dagger) \\ = \left\{ \begin{array}{l} 0, & \text{if } \left[ \psi \right] \left( c \right) = 0 \\ \left[ n > c \right] \cdot \frac{1}{m^{c+1}} \cdot \alpha \cdot c^a \cdot b^c \cdot m^n, & \text{if } \left[ \psi \right] \left( c \right) = 1 \end{array} \right. \\ & \in \mathbb{PE}[\overline{x}] \quad \text{(since } \frac{1}{m^{c+1}} \cdot \alpha \cdot c^a \cdot b^c \in \text{Af}[\overline{x}]). \end{array} \tag{7}$$

The step marked with (†) holds as we have n>i − 1 = 1 for all i ∈ {1,...,n} and the step marked with (††) holds since i = c + 1 implies ψ (i − 1) = 0. If ψ does not contain a positive literal, then let c be the maximal constant that occurs in ψ or −1 if ψ is empty. We get:

$$\begin{array}{l} \sum\_{i=1}^{n} \left[ \begin{matrix} \psi \end{matrix} \right] (i-1) \cdot m^{n-i} \cdot \alpha \cdot (i-1)^{a} \cdot b^{i-1} \\ = \sum\_{i=1}^{n} \left[ n > i-1 \right] \cdot \left[ \begin{matrix} \psi \end{matrix} \right] (i-1) \cdot m^{n-i} \cdot \alpha \cdot (i-1)^{a} \cdot b^{i-1} \\ = \sum\_{i=1}^{c+1} \left[ n > i-1 \right] \cdot \left[ \psi \right] (i-1) \cdot m^{n-i} \cdot \alpha \cdot (i-1)^{a} \cdot b^{i-1} \\ + \sum\_{i=c+2}^{n} m^{n-i} \cdot \alpha \cdot (i-1)^{a} \cdot b^{i-1} \end{matrix} \right. \tag{8}$$

Again, the step marked with (†) holds since we have n>i − 1 = 1 for all i ∈ {1,...,n}. The last step holds as i ≥ c + 2 implies ψ (i − 1) = 1. Similar to the case where ψ contains a positive literal, we can compute a poly-exponential expression which is equivalent to the first addend. We have

$$=\sum\_{\substack{i=1\\1\le i\le c+1\\\|\psi\|(i-1)=1}}^{c+1} \begin{bmatrix} n>i-1 \end{bmatrix} \cdot \begin{bmatrix} \psi \end{bmatrix} (i-1)\cdot m^{n-i}\cdot \alpha \cdot (i-1)^a \cdot b^{i-1} \\\ n>i-1 \end{bmatrix}$$
 
$$=\sum\_{\substack{1\le i\le c+1\\\|\psi\|(i-1)=1}}^{c+1} \begin{bmatrix} n>i-1 \end{bmatrix} \cdot \frac{1}{m^i} \cdot \alpha \cdot (i-1)^a \cdot b^{i-1} \cdot m^n \tag{9}$$

which is a poly-exponential expression as <sup>1</sup> <sup>m</sup><sup>i</sup> · <sup>α</sup> · (<sup>i</sup> <sup>−</sup> 1)<sup>a</sup> · <sup>b</sup><sup>i</sup>−<sup>1</sup> <sup>∈</sup> <sup>A</sup>f[x]. For the second addend, we have:

$$\begin{aligned} &\quad \sum\_{i=c+2}^{n} m^{m-i} \cdot \alpha \cdot (i-1)^{a} \cdot b^{i-1} \\ &= \frac{a}{b} \cdot m^{n} \cdot \sum\_{i=c+2}^{n} (i-1)^{a} \cdot \left(\frac{b}{m}\right)^{i} \\ &= \frac{a}{b} \cdot m^{n} \cdot \sum\_{i=c+2}^{n} \left(r[n/i] - \frac{m}{b} \cdot r[n/i-1]\right) \cdot \left(\frac{b}{m}\right)^{i} \text{(Lemma 10 with } c=\frac{m}{b}\text{)} \\ &= \frac{a}{b} \cdot m^{n} \cdot \left(\sum\_{i=c+2}^{n} r[n/i] \cdot \left(\frac{b}{m}\right)^{i} - \sum\_{i=c+2}^{n} \frac{m}{b} \cdot r[n/i-1] \cdot \left(\frac{b}{m}\right)^{i}\right) \\ &= \frac{a}{b} \cdot m^{n} \cdot \left(\sum\_{i=c+2}^{n} r[n/i] \cdot \left(\frac{b}{m}\right)^{i} - \sum\_{i=c+1}^{n-1} r[n/i] \cdot \left(\frac{b}{m}\right)^{i}\right) \\ &= \frac{a}{b} \cdot m^{n} \cdot \left[n > c+1\right] \cdot \left(r \cdot \left(\frac{b}{m}\right)^{n} - r[n/c+1] \cdot \left(\frac{b}{m}\right)^{c+1}\right) \\ &= \left[\ln > c+1\right] \cdot \frac{a}{b} \cdot r \cdot b^{n} - \left[n > c+1\right] \cdot r[n/c+1] \cdot \left(\frac{b}{m}\right)^{c+1} \cdot \frac{\alpha}{b} \cdot m^{n} \end{aligned} \tag{10}$$

Lemma <sup>10</sup> ensures <sup>r</sup> <sup>∈</sup> <sup>Q</sup>[n], i.e., we have <sup>r</sup> <sup>=</sup> <sup>d</sup><sup>r</sup> <sup>i</sup>=0 <sup>m</sup><sup>i</sup> · <sup>n</sup><sup>i</sup> for some <sup>d</sup><sup>r</sup> <sup>∈</sup> <sup>N</sup> and <sup>m</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup>. Thus, <sup>r</sup>[n/c+1]· b m <sup>c</sup>+1 · <sup>α</sup> <sup>b</sup> <sup>∈</sup> <sup>A</sup>f[x] which implies n>c + 1·r[n/c+1]· b m <sup>c</sup>+1 · <sup>α</sup> <sup>b</sup> ·m<sup>n</sup> <sup>∈</sup> PE[x]. It remains to show that the addend n>c + 1 · <sup>α</sup> <sup>b</sup> ·<sup>r</sup> ·b<sup>n</sup> is equivalent to a poly-exponential expression. As <sup>α</sup> <sup>b</sup> · <sup>m</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup>f[x], we have

$$\mathbb{E}\left[n>c+1\right] \cdot \frac{\alpha}{b} \cdot r \cdot b^n = \sum\_{i=0}^{d\_r} \left\{ n > c+1 \right\} \cdot \frac{\alpha}{b} \cdot m\_i \cdot n^i \cdot b^n \in \mathbb{PE}[\overline{x}].\tag{11}$$

 

The proof of Lemma 12 gives rise to a corresponding algorithm.

**Algorithm 2.** symbolic sum **Input:** <sup>m</sup> <sup>∈</sup> <sup>N</sup>, p <sup>∈</sup> PE[x] **Result:** <sup>q</sup> <sup>∈</sup> PE[x] which is equivalent to <sup>n</sup> <sup>i</sup>=1 <sup>m</sup>n−<sup>i</sup> · <sup>p</sup>[n/i <sup>−</sup> 1] rearrange <sup>n</sup> <sup>i</sup>=1 <sup>m</sup>n−<sup>i</sup> · <sup>p</sup>[n/i <sup>−</sup> 1] to - <sup>j</sup>=1 p<sup>j</sup> as in (3) **foreach** p<sup>j</sup> ∈ {p1,...,p-} **do if** m = 0 **then** compute q<sup>j</sup> as in (5) and (6) **else if** p<sup>j</sup> = -... ∧ n = c ∧ ... · ... **then** compute q<sup>j</sup> as in (7) **else** • split p<sup>j</sup> into two sums pj,<sup>1</sup> and pj,<sup>2</sup> as in (8) • compute qj,<sup>1</sup> from pj,<sup>1</sup> as in (9) • compute qj,<sup>2</sup> from pj,<sup>2</sup> as in (10) and (11) using Algorithm 1 • q<sup>j</sup> ← qj,<sup>1</sup> + qj,<sup>2</sup> **return** - <sup>j</sup>=1 q<sup>j</sup>

**Example 13.** *We compute an equivalent poly-exponential expression for*

$$\sum\_{i=1}^{n} 4^{n-i} \cdot \left( \left\lceil n = 0 \right\rceil \cdot 2 \cdot w + \left\lceil n \neq 0 \right\rceil \cdot 4 - 2 \right) \left[ n/i - 1 \right] \tag{12}$$

*where* w *is a variable. (It will later on be needed to compute a closed form for Example 4, see Example 18.) According to Algorithm 2 and* (3)*, we get*

$$\begin{array}{l} \sum\_{i=1}^{n} 4^{n-i} \cdot \left( \left[ n = 0 \right] \cdot 2 \cdot w + \left[ n \neq 0 \right] \cdot 4 - 2 \right) \left[ n/i - 1 \right] \\ = \sum\_{i=1}^{n} 4^{n-i} \cdot \left( \left[ i - 1 = 0 \right] \cdot 2 \cdot w + \left[ \left[ i - 1 \neq 0 \right] \cdot 4 - 2 \right) \right) \\ = p\_1 + p\_2 + p\_3 \end{array}$$

*with* <sup>p</sup><sup>1</sup> <sup>=</sup> <sup>n</sup> <sup>i</sup>=1 <sup>i</sup> <sup>−</sup> 1=0 · <sup>4</sup><sup>n</sup>−<sup>i</sup> · <sup>2</sup> · <sup>w</sup>*,* <sup>p</sup><sup>2</sup> <sup>=</sup> <sup>n</sup> <sup>i</sup>=1 <sup>i</sup> <sup>−</sup> <sup>1</sup> = 0 · <sup>4</sup><sup>n</sup>−<sup>i</sup> · <sup>4</sup>*, and* <sup>p</sup><sup>3</sup> <sup>=</sup> <sup>n</sup> <sup>i</sup>=1 <sup>4</sup><sup>n</sup>−<sup>i</sup> · (−2)*. We search for* <sup>q</sup>1, q2, q<sup>3</sup> <sup>∈</sup> PE[w] *that are equivalent to* p1, p2, p3*, i.e.,* q<sup>1</sup> + q<sup>2</sup> + q<sup>3</sup> *is equivalent to* (12)*. We only show how to compute* q2*(and omit the computation of* q<sup>1</sup> = <sup>n</sup> = 0 · <sup>1</sup> <sup>2</sup> · <sup>w</sup> · <sup>4</sup><sup>n</sup> *and* <sup>q</sup><sup>3</sup> <sup>=</sup> <sup>2</sup> <sup>3</sup> <sup>−</sup> <sup>2</sup> <sup>3</sup> · <sup>4</sup><sup>n</sup>*). Analogously to* (8)*, we get:*

$$\begin{array}{l} \sum\_{i=1}^{n} \left\lbrack i-1 \neq 0 \right\rbrack \cdot 4^{n-i} \cdot 4 \\ = \sum\_{i=1}^{n} \left\lbrack n > i-1 \right\rbrack \cdot \left\lbrack i-1 \neq 0 \right\rbrack \cdot 4^{n-i} \cdot 4 \\ = \sum\_{i=1}^{1} \left\lbrack n > i-1 \right\rbrack \cdot \left\lbrack i-1 \neq 0 \right\rbrack \cdot 4^{n-1} \cdot 4 \\ \quad + \sum\_{i=2}^{n} 4^{n-i} \cdot 4 \end{array}$$

*The next step is to rearrange the first sum as in* (9)*. In our example, it directly simplifies to* 0 *and hence we obtain*

$$\sum\_{i=1}^{1} \left[ n > i - 1 \right] \cdot \left[ i - 1 \neq 0 \right] \cdot 4^{n-1} \cdot 4 + \sum\_{i=2}^{n} 4^{n-i} \cdot 4 = \sum\_{i=2}^{n} 4^{n-i} \cdot 4.$$

*Finally, by applying the steps from* (10) *we get:*

$$\begin{array}{l} \sum\_{i=2}^{n} 4^{n-i} \cdot 4 \\ = 4 \cdot 4^{n} \cdot \sum\_{i=2}^{n} \left(\frac{1}{4}\right)^{i} \\ = 4 \cdot 4^{n} \cdot \sum\_{i=2}^{n} \left(-\frac{1}{3} - 4 \cdot \left(-\frac{1}{3}\right)\right) \cdot \left(\frac{1}{4}\right)^{i} \\ = 4 \cdot 4^{n} \cdot \left(\sum\_{i=2}^{n} \left(-\frac{1}{3}\right) \cdot \left(\frac{1}{4}\right)^{i} - \sum\_{i=2}^{n} 4 \cdot \left(-\frac{1}{3}\right) \cdot \left(\frac{1}{4}\right)^{i}\right) \\ = 4 \cdot 4^{n} \cdot \left(\sum\_{i=2}^{n} \left(-\frac{1}{3}\right) \cdot \left(\frac{1}{4}\right)^{i} - \sum\_{i=1}^{n-1} \left(-\frac{1}{3}\right) \cdot \left(\frac{1}{4}\right)^{i}\right) \\ = 4 \cdot 4^{n} \cdot \left[n > 1\right] \cdot \left(\left(-\frac{1}{3}\right) \cdot \left(\frac{1}{4}\right)^{n} - \left(-\frac{1}{3}\right) \cdot \frac{1}{4}\right) \\ = \left[n > 1\right] \cdot \left(-\frac{4}{3}\right) + \left[n > 1\right] \cdot \frac{1}{3} \cdot 4^{n} \\ = q\_{2} \end{array}$$

*The step marked with* (†) *holds by Lemma 10 with* q = 1 *and* c = 4*. Thus, we have* <sup>r</sup> <sup>=</sup> <sup>−</sup><sup>1</sup> <sup>3</sup> *, cf. Example 11.*

Recall that our goal is to compute closed forms for loops. As a first step, instead of the n-fold update function h(n, x) = f <sup>n</sup>(x) of (1) where f is the update of (1), we consider a recursive update function for a single variable x ∈ x:

$$g(0, \overline{x}) = x \quad \text{and} \quad g(n, \overline{x}) = m \cdot g(n-1, \overline{x}) + p[n/n-1] \quad \text{for all } n > 0$$

Here, <sup>m</sup> <sup>∈</sup> <sup>N</sup> and <sup>p</sup> <sup>∈</sup> PE[x]. Using Lemma 12, it is easy to show that <sup>g</sup> can be represented by a poly-exponential expression.

**Lemma 14 (Closed Form for Single Variables).** *If* <sup>x</sup> <sup>∈</sup> <sup>x</sup>*,* <sup>m</sup> <sup>∈</sup> <sup>N</sup>*, and* <sup>p</sup> <sup>∈</sup> PE[x]*, then one can compute a* <sup>q</sup> <sup>∈</sup> PE[x] *which satisfies*

$$q\left[n/0\right] = x \quad \text{and} \quad q = \left(m \cdot q + p\right)\left[n/n - 1\right] \quad \text{for all } n > 0.$$

*Proof.* It suffices to find a <sup>q</sup> <sup>∈</sup> PE[x] that satisfies

$$q = m^n \cdot x + \sum\_{i=1}^n m^{n-i} \cdot p[n/i - 1]. \tag{13}$$

To see why (13) is sufficient, note that (13) implies

$$q[n/0] = -m^0 \cdot x + \sum\_{i=1}^{0} m^{0-i} \cdot p[n/i-1] \quad = \quad x$$

and for n > 0, (13) implies

$$\begin{aligned} q &= m^n \cdot x + \sum\_{i=1}^n m^{n-i} \cdot p[n/i - 1] \\ &= m^n \cdot x + \left(\sum\_{i=1}^{n-1} m^{n-i} \cdot p[n/i - 1]\right) + p[n/n - 1] \\ &= m \cdot \left(m^{n-1} \cdot x + \sum\_{i=1}^{n-1} m^{n-i-1} \cdot p[n/i - 1]\right) + p[n/n - 1] \\ &= m \cdot q[n/n - 1] + p[n/n - 1] \\ &= (m \cdot q + p)[n/n - 1]. \end{aligned}$$

By Lemma 12, we can compute a <sup>q</sup> <sup>∈</sup> PE[x] such that

$$m^n \cdot x + \sum\_{i=1}^n m^{n-i} \cdot p[n/i - 1] \quad = \quad m^n \cdot x + q'.$$

Moreover,

$$\text{if } m = 0, \text{ then } m^n \cdot x = \lceil n = 0 \rceil \cdot x \in \mathbb{PE}[\overline{x}] \text{ and} \tag{14}$$

$$\text{if } m > 0, \text{ then } m^n \cdot x \in \mathbb{PE}[\overline{x}]. \tag{15}$$

So both addends are equivalent to poly-exponential expressions. 

**Example 15.** *We show how to compute the closed forms for the variables* w *and* x *from Example 4. We first consider the assignment* w ← 2*, i.e., we want to compute a* <sup>q</sup><sup>w</sup> <sup>∈</sup> PE[w, x, y, z] *with* <sup>q</sup>w[n/0] = <sup>w</sup> *and* <sup>q</sup><sup>w</sup> = (m<sup>w</sup> ·q<sup>w</sup> <sup>+</sup>pw) [n/n−1] *for* n > 0*, where* m<sup>w</sup> = 0 *and* p<sup>w</sup> = 2*. According to* (13) *and* (14)*,* q<sup>w</sup> *is*

$$m\_w^n \cdot w + \sum\_{i=1}^n m\_w^{n-i} \cdot p\_w[n/i - 1] = 0^n \cdot w + \sum\_{i=1}^n 0^{n-i} \cdot 2 = \lceil n = 0 \rceil \cdot w + \lceil n \neq 0 \rceil \cdot 2.1$$

*For the assignment* x ← x + 2*, we search for a* q<sup>x</sup> *such that* qx[n/0] = x *and* q<sup>x</sup> = (m<sup>x</sup> · q<sup>x</sup> + px) [n/n − 1] *for* n > 0*, where* m<sup>x</sup> = 1 *and* p<sup>x</sup> = 2*. By* (13)*,* q<sup>x</sup> *is*

$$m\_x^n \cdot x + \sum\_{i=1}^n m\_x^{n-i} \cdot p\_x[n/i-1] = 1^n \cdot x + \sum\_{i=1}^n 1^{n-i} \cdot 2 = x + 2 \cdot n.$$

The restriction to triangular matrices now allows us to generalize Lemma 14 to vectors of variables. The reason is that due to triangularity, the update of each program variable x<sup>i</sup> only depends on the previous values of x1,...,xi. So when regarding xi, we can assume that we already know the closed forms for x1,...,x<sup>i</sup>−<sup>1</sup>. This allows us to find closed forms for one variable after the other by applying Lemma 14 repeatedly. In other words, it allows us to find a vector q of poly-exponential expressions that satisfies

$$
\overline{q}\left[n/0\right] = \overline{x} \quad \text{and} \quad \overline{q} = A\,\overline{q}[n/n-1] + \overline{a} \quad \text{for all } n > 0.
$$

To prove this claim, we show the more general Lemma 16. For all i1,...,i<sup>k</sup> ∈ {1,...,m}, we define [z1,...,zm]<sup>i</sup>1,...,i<sup>k</sup> = [z<sup>i</sup><sup>1</sup> ,...,z<sup>i</sup><sup>k</sup> ] (and the notation y<sup>i</sup>1,...,i<sup>k</sup> for column vectors is defined analogously). Moreover, for a matrix A, A<sup>i</sup> is A's i th row and <sup>A</sup><sup>i</sup>1,...,in;j1,...,j<sup>k</sup> is the matrix with rows (A<sup>i</sup><sup>1</sup> )<sup>j</sup>1,...,j<sup>k</sup> ,...,(A<sup>i</sup><sup>n</sup> )<sup>j</sup>1,...,j<sup>k</sup> . So for A = ⎡ ⎣ a1,<sup>1</sup> a1,<sup>2</sup> a1,<sup>3</sup> a2,<sup>1</sup> a2,<sup>2</sup> a2,<sup>3</sup> a3,<sup>1</sup> a3,<sup>2</sup> a3,<sup>3</sup> ⎤ <sup>⎦</sup>, we have <sup>A</sup>1,2;1,<sup>3</sup> <sup>=</sup> a1,<sup>1</sup> a1,<sup>3</sup> a2,<sup>1</sup> a2,<sup>3</sup> .

**Lemma 16. (Closed Forms for Vectors of Variables).** *If* x *is a vector of at least* <sup>d</sup> <sup>≥</sup> <sup>1</sup> *pairwise different variables,* <sup>A</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>×<sup>d</sup> *is triangular with* <sup>A</sup><sup>i</sup>;<sup>i</sup> <sup>≥</sup> <sup>0</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>d</sup>*, and* <sup>p</sup> <sup>∈</sup> PE[x] <sup>d</sup>*, then one can compute* <sup>q</sup> <sup>∈</sup> PE[x] <sup>d</sup> *such that:*

$$
\overline{q}\left[n/0\right] = \overline{x}\_{1,\ldots,d} \quad \text{and} \tag{16}
$$

$$\overline{q} = (A\,\overline{q} + \overline{p})\,\left[n/n - 1\right]\quad\text{for all }n > 0\tag{17}$$

*Proof.* Assume that A is lower triangular (the case that A is upper triangular works analogously). We use induction on d. For any d ≥ 1 we have:

$$\begin{array}{c} \overline{q} = \left( A\overline{q} + \overline{p} \right) \left[ n/n - 1 \right] \\ \Longleftrightarrow \overline{q}\_{j} = \left( A\_{j} \cdot \overline{q} + \overline{p}\_{j} \right) \left[ n/n - 1 \right] & \text{for all } 1 \le j \le d \\ \Longleftrightarrow \overline{q}\_{j} = \left( A\_{j;2,\ldots,d} \cdot \overline{q}\_{2,\ldots,d} + A\_{j;1} \cdot \overline{q}\_{1} + \overline{p}\_{j} \right) \left[ n/n - 1 \right] & \text{for all } 1 \le j \le d \\ \Longleftrightarrow \overline{q}\_{1} = \left( A\_{1;2,\ldots,d} \cdot \overline{q}\_{2,\ldots,d} + A\_{1;1} \cdot \overline{q}\_{1} + \overline{p}\_{1} \right) \left[ n/n - 1 \right] \land \\ \quad \overline{q}\_{j} = \left( A\_{j;2,\ldots,d} \cdot \overline{q}\_{2,\ldots,d} + A\_{j;1} \cdot \overline{q}\_{1} + \overline{p}\_{j} \right) \left[ n/n - 1 \right] \text{ for all } 1 < j \le d \\ \Longleftrightarrow \overline{q}\_{1} = \left( A\_{1;1} \cdot \overline{q}\_{1} + \overline{p}\_{1} \right) \left[ n/n - 1 \right] \quad \land \\ \overline{q}\_{j} = \left( A\_{j;2,\ldots,d} \cdot \overline{q}\_{2,\ldots,d} + A\_{j;1} \cdot \overline{q}\_{1} + \overline{p}\_{j} \right) \left[ n/n - 1 \right] \text{ for all } 1 < j \le d \end{array}$$

The last step holds as A is lower triangular. By Lemma 14, we can compute a <sup>q</sup><sup>1</sup> <sup>∈</sup> PE[x] that satisfies

$$
\overline{q}\_1[n/0] = \overline{x}\_1 \quad \text{and} \quad \overline{q}\_1 = (A\_{1;1} \cdot \overline{q}\_1 + \overline{p}\_1) \ [n/n - 1] \quad \text{for all } n > 0.
$$

In the induction base (d = 1), there is no j with 1 < j ≤ d. In the induction step (d > 1), it remains to show that we can compute q2,...,d such that

$$
\overline{q}\_j[n/0] = \overline{x}\_j \quad \text{and} \quad \overline{q}\_j = (A\_{j;2,...,d} \cdot \overline{q}\_{2,...,d} + A\_{j;1} \cdot \overline{q}\_1 + \overline{p}\_j) \ [n/n - 1],
$$

for all n > 0 and all 1 < j ≤ d, which is equivalent to

$$\begin{aligned} \overline{q}\_{2,\ldots,d}[n/0] &= \overline{x}\_{2,\ldots,d} \quad \text{and} \\ \overline{q}\_{2,\ldots,d} &= (A\_{2,\ldots,d;2,\ldots,d} \cdot \overline{q}\_{2,\ldots,d} + \begin{bmatrix} A\_{2;1} \\ \vdots \\ A\_{d;1} \end{bmatrix} \cdot \overline{q}\_1 + \overline{p}\_{2,\ldots,d}) \begin{bmatrix} n/n-1 \end{bmatrix} \end{aligned}$$

for all n > 0. As <sup>A</sup><sup>j</sup>;1 · <sup>q</sup><sup>1</sup> <sup>+</sup> <sup>p</sup><sup>j</sup> <sup>∈</sup> PE[x] for each 2 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>d</sup>, the claim follows from the induction hypothesis. 

Together, Lemmas 14 and 16 and their proofs give rise to the following algorithm to compute a solution for (16) and (17). It computes a closed form q<sup>1</sup> for x<sup>1</sup> as in the proof of Lemma 14, constructs the argument p for the recursive call based on A, q1, and the current value of p as in the proof of Lemma 16, and then determines the closed form for x2,...,d recursively.

**Algorithm 3.** closed form

**Input:** <sup>x</sup>1,...,d, A <sup>∈</sup> <sup>Z</sup><sup>d</sup>×<sup>d</sup> where <sup>A</sup><sup>i</sup>;<sup>i</sup> <sup>≥</sup> 0 for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>d</sup>, <sup>p</sup> <sup>∈</sup> PE[x] d **Result:** <sup>q</sup> <sup>∈</sup> PE[x] <sup>d</sup> which satisfies (16)&(17) for the given x, A, and p q ← symbolic sum(A1;1, p1) (cf. Algorithm 2) **if** A1;1 = 0 **then** q<sup>1</sup> ← <sup>n</sup> = 0 · <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>q</sup> **else** <sup>q</sup><sup>1</sup> <sup>←</sup> <sup>A</sup><sup>n</sup> 1;1 · x<sup>1</sup> + q (cf. (13–15)) **if** d > 1 **then** q2,...,d ← closed form(x2,...,d, A2,...,d;2,...,d, ⎡ ⎢ ⎣ A2;1 . . . A<sup>d</sup>;1 ⎤ ⎥ <sup>⎦</sup> · <sup>q</sup><sup>1</sup> <sup>+</sup> <sup>p</sup>2,...,d) **return** q

We can now prove the main theorem of this section.

**Theorem 17 (Closed Forms for nnt-Loops).** *One can compute a closed form for every nnt-loop. In other words, if* <sup>f</sup> : <sup>Z</sup><sup>d</sup> <sup>→</sup> <sup>Z</sup><sup>d</sup> *is the update function of an nnt-loop with the variables* <sup>x</sup>*, then one can compute a* <sup>q</sup> <sup>∈</sup> PE[x] <sup>d</sup> *such that* <sup>q</sup>[n/c] = <sup>f</sup> <sup>c</sup>(x) *for all* <sup>c</sup> <sup>∈</sup> <sup>N</sup>*.*

*Proof.* Consider an nnt-loop of the form (1). By Lemma 16, we can compute a <sup>q</sup> <sup>⊆</sup> PE[x] <sup>d</sup> that satisfies

$$
\overline{q}[n/0] = \overline{x} \quad \text{and} \quad \overline{q} = (A\,\overline{q} + \overline{a})\,\,\left[n/n - 1\right] \quad \text{for all } n > 0.
$$

We prove <sup>f</sup> <sup>c</sup>(x) = <sup>q</sup>[n/c] by induction on <sup>c</sup> <sup>∈</sup> <sup>N</sup>. If <sup>c</sup> = 0, we get

$$f^c(\overline{x}) = f^0(\overline{x}) = \overline{x} = \overline{q}[n/0] = \overline{q}[n/c].$$

If c > 0, we get: f <sup>c</sup>(x) = A f <sup>c</sup>−<sup>1</sup>(x) + a by definition of f = A q[n/c − 1] + a by the induction hypothesis = (<sup>A</sup> <sup>q</sup> <sup>+</sup> <sup>a</sup>) [n/c <sup>−</sup> 1] as <sup>a</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup> does not contain <sup>n</sup> = q[n/c]

So invoking Algorithm 3 on x, A, and a yields the closed form of an nnt-loop (1).

**Example 18.** *We show how to compute the closed form for Example 4. For*

$$y \leftarrow 2 \cdot w + 4 \cdot y - 2,$$

*we obtain*

$$\begin{array}{l} q\_{\boldsymbol{y}} = \left( 4 \cdot q\_{\boldsymbol{y}} + 2 \cdot q\_{\boldsymbol{w}} - 2 \right) \left[ n/n - 1 \right] \\ = 4^{n} \cdot \boldsymbol{y} + \sum\_{i=1}^{n} 4^{n-i} \cdot \left( 2 \cdot q\_{\boldsymbol{w}} - 2 \right) \left[ n/i - 1 \right] & \text{(by (13))} \\ = \boldsymbol{y} \cdot 4^{n} + \sum\_{i=1}^{n} 4^{n-i} \cdot \left( \left[ n = 0 \right] \cdot 2 \cdot w + \left[ n \neq 0 \right] \cdot 4 - 2 \right) \left[ n/i - 1 \right] \text{ (see Example 15)} \\ = q\_{\boldsymbol{0}} + q\_{\boldsymbol{1}} + q\_{\boldsymbol{2}} + q\_{3} & \text{(see Example 13)} \\ \end{array}$$

*where* <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>y</sup> · <sup>4</sup><sup>n</sup>*. For* <sup>z</sup> <sup>←</sup> <sup>x</sup> + 1*, we get*

$$\begin{array}{l} q\_z = \left( q\_x + 1 \right) \left[ n/n - 1 \right] \\ = 0^n \cdot z + \sum\_{i=1}^n 0^{n-i} \cdot \left( q\_x + 1 \right) \left[ n/i - 1 \right] \\ = \left[ n = 0 \right] \cdot z + \left[ n \neq 0 \right] \cdot \left( q\_x \left[ n/n - 1 \right] + 1 \right) \\ = \left[ n = 0 \right] \cdot z + \left[ n \neq 0 \right] \cdot \left( \left( x + 2 \cdot n \right) \left[ n/n - 1 \right] + 1 \right) \\ = \left[ n = 0 \right] \cdot z + \left[ n \neq 0 \right] \cdot \left( x - 1 \right) + \left[ n \neq 0 \right] \cdot 2 \cdot n. \end{array} \tag{see Example 15}$$

*So the closed form of Example 4 for the values of the variables after* n *iterations is:*

$$
\begin{bmatrix} q\_w \\ q\_x \\ q\_y \\ q\_z \end{bmatrix} = \begin{bmatrix} \mathbb{I}n = \mathbb{0} \mathbb{I} \cdot w + \mathbb{I}n \neq \mathbb{0} \mathbb{I} \cdot 2 \\ x + 2 \cdot n \\ q\_0 + q\_1 + q\_2 + q\_3 \\ \mathbb{I}n = \mathbb{0} \mathbb{I} \cdot z + \left[ n \neq 0 \right] \cdot (x - 1) + \left[ n \neq 0 \right] \cdot 2 \cdot n \end{bmatrix}
$$

$$\square$$

#### **4 Deciding Non-Termination of nnt-Loops**

Our proof uses the notion of *eventual non-termination* [4,14]. Here, the idea is to disregard the condition of the loop during a finite prefix of the program run.

**Definition 19 (Eventual Non-Termination).** *A vector* <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup> witnesses eventual non-termination *of (1) if*

$$
\exists n\_0 \in \mathbb{N}. \,\forall n \in \mathbb{N}\_{>n\_0}. \,\varphi[\overline{x}/f^n(\overline{c})].
$$

*If there is such a witness, then (1) is* eventually non-terminating*.*

Clearly, (1) is non-terminating iff (1) is eventually non-terminating [14]. Now Theorem 17 gives rise to an alternative characterization of eventual non-termination in terms of the closed form q instead of f <sup>n</sup>(c).

**Corollary 20 (Expressing Non-Termination with** PE**).** *If* q *is the closed form of (1), then* <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup> *witnesses eventual non-termination iff*

$$\exists n\_0 \in \mathbb{N}. \ \forall n \in \mathbb{N}\_{>n\_0}. \ \varphi[\overline{x}/\overline{q}][\overline{x}/\overline{c}]. \tag{18}$$

*Proof.* Immediate, as <sup>q</sup> is equivalent to <sup>f</sup> <sup>n</sup>(x).

So to prove that termination of nnt-loops is decidable, we will use Corollary 20 to show that the existence of a witness for eventual non-termination is decidable. To do so, we first eliminate the factors ψ from the closed form q. Assume that q has at least one factor ψ where ψ is non-empty (otherwise, all factors ψ are equivalent to 1) and let c be the maximal constant that occurs in such a factor. Then all addends <sup>ψ</sup> ·α·n<sup>a</sup> · <sup>b</sup><sup>n</sup> where <sup>ψ</sup> contains a positive literal become 0 and all other addends become <sup>α</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup> if n>c. Thus, as we can assume <sup>n</sup><sup>0</sup> > c in (18) without loss of generality, all factors ψ can be eliminated when checking eventual non-termination.

**Corollary 21 Removing** ψ **from** PE**s).** *Let* q *be the closed form of an nntloop (1). Let* qnorm *result from* q *by removing all addends* <sup>ψ</sup> · <sup>α</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup> *where* ψ *contains a positive literal and by replacing all addends* <sup>ψ</sup> · <sup>α</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup> *where* <sup>ψ</sup> *does not contain a positive literal by* <sup>α</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup>*. Then* <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup> *is a witness for eventual non-termination iff*

$$\exists n\_0 \in \mathbb{N}. \ \forall n \in \mathbb{N}\_{>n\_0}. \ \varphi[\overline{x}/\overline{q}\_{norm}][\overline{x}/\overline{c}].\tag{19}$$

By removing the factors ψ from the closed form q of an nnt-loop, we obtain *normalized* poly-exponential expressions.

**Definition 22 (Normalized** PE**s).** *We call* <sup>p</sup> <sup>∈</sup> PE[x] normalized *if it is in*

$$\mathbb{NP}\mathbb{E}[\overline{x}] = \left\{ \sum\_{j=1}^{\ell} \alpha\_j \cdot n^{a\_j} \cdot b\_j^n \, \middle| \, \ell, a\_j \in \mathbb{N}, \ \alpha\_j \in \mathbb{A}\mathbb{f}[\overline{x}], \ b\_j \in \mathbb{N}\_{\geq 1} \right\}.$$

*W.l.o.g., we always assume* (bi, ai) = (b<sup>j</sup> , a<sup>j</sup> ) *for all* i, j ∈ {1,...,} *with* i = j*. We define* NPE <sup>=</sup> NPE[∅]*, i.e., we have* <sup>p</sup> <sup>∈</sup> NPE *if* <sup>α</sup><sup>j</sup> <sup>∈</sup> <sup>Q</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> *.*

**Example 23.** *We continue Example 18. By omitting the factors* ψ*,*

$$\begin{array}{llll} q\_w = \left\[ n = 0 \right\} \cdot w + \left\[ n \neq 0 \right\} \cdot 2 & \text{becomes } 2, \\ q\_z = \left\[ n = 0 \right\} \cdot z + \left\[ n \neq 0 \right\} \cdot (x - 1) + \left\[ n \neq 0 \right\} \cdot 2 \cdot n & \text{becomes } x - 1 + 2 \cdot n, \end{array}$$

*and* <sup>q</sup><sup>x</sup> <sup>=</sup> <sup>x</sup> + 2 · n, q<sup>0</sup> <sup>=</sup> <sup>y</sup> · <sup>4</sup>n*, and* <sup>q</sup><sup>3</sup> <sup>=</sup> <sup>2</sup> <sup>3</sup> <sup>−</sup> <sup>2</sup> <sup>3</sup> · <sup>4</sup><sup>n</sup> *remain unchanged. Moreover,*

$$\begin{aligned} q\_1 &= \left\[ n \neq 0 \right\} \cdot \frac{1}{2} \cdot w \cdot 4^n & \text{becomes } \frac{1}{2} \cdot w \cdot 4^n & \text{and} \\ q\_2 &= \left\[ n > 1 \right\} \cdot \left( -\frac{4}{3} \right) + \left\[ n > 1 \right\} \cdot \frac{1}{3} \cdot 4^n \text{ becomes } \left( -\frac{4}{3} \right) + \frac{1}{3} \cdot 4^n. \end{aligned}$$

*Thus,* q<sup>y</sup> = q<sup>0</sup> + q<sup>1</sup> + q<sup>2</sup> + q<sup>3</sup> *becomes*

$$y \cdot 4^n + \frac{1}{2} \cdot w \cdot 4^n - \frac{4}{3} + \frac{1}{3} \cdot 4^n + \frac{2}{3} - \frac{2}{3} \cdot 4^n = 4^n \cdot \left( y - \frac{1}{3} + \frac{1}{2} \cdot w \right) - \frac{2}{3} \cdot \frac{1}{4}$$

*Let* σ = w/2, x/x + 2 · n, y/4<sup>n</sup> · <sup>y</sup> <sup>−</sup> <sup>1</sup> <sup>3</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> · w <sup>−</sup> <sup>2</sup> <sup>3</sup> , z/x − 1+2 · n *. Then we get that Example <sup>2</sup> is non-terminating iff there are* w, x, y, z <sup>∈</sup> <sup>Z</sup>, n<sup>0</sup> <sup>∈</sup> <sup>N</sup> *such that*

$$\begin{array}{llll} (y+z)\ \sigma > 0 \land (-w-2\cdot y+x)\ \sigma > 0 & \iff & \iff \\ 4^n \cdot \left( y - \frac{1}{3} + \frac{1}{2} \cdot w \right) - \frac{2}{3} + x - 1 + 2\cdot n > 0 & \land \\ -2 - 2\cdot \left( 4^n \cdot \left( y - \frac{1}{3} + \frac{1}{2} \cdot w \right) - \frac{2}{3} \right) + x + 2\cdot n > 0 & \iff \\ p\_1^{\varphi} > 0 \land p\_2^{\varphi} > 0 & \end{array}$$

*holds for all* n>n<sup>0</sup> *where*

$$\begin{aligned} p\_1^\varphi &= 4^n \cdot \left( y - \frac{1}{3} + \frac{1}{2} \cdot w \right) + 2 \cdot n + x - \frac{5}{3} \text{ and} \\ p\_2^\varphi &= 4^n \cdot \left( \frac{2}{3} - 2 \cdot y - w \right) + 2 \cdot n + x - \frac{2}{3} . \end{aligned}$$

Recall that the loop condition ϕ is a conjunction of inequalities of the form α > 0 where <sup>α</sup> <sup>∈</sup> <sup>A</sup>f[x]. Thus, <sup>ϕ</sup>[x/qnorm] is a conjunction of inequalities p > <sup>0</sup> where <sup>p</sup> <sup>∈</sup> NPE[x] and we need to decide if there is an instantiation of these inequalities that is valid "for large enough n". To do so, we order the coefficients α<sup>j</sup> of the addends <sup>α</sup><sup>j</sup> · <sup>n</sup><sup>a</sup><sup>j</sup> · <sup>b</sup><sup>n</sup> <sup>j</sup> of normalized poly-exponential expressions according to the addend's asymptotic growth when increasing n. Lemma 24 shows that <sup>α</sup><sup>2</sup> · <sup>n</sup><sup>a</sup><sup>2</sup> · <sup>b</sup><sup>n</sup> <sup>2</sup> grows faster than <sup>α</sup><sup>1</sup> · <sup>n</sup><sup>a</sup><sup>1</sup> · <sup>b</sup><sup>n</sup> <sup>1</sup> iff b<sup>2</sup> > b<sup>1</sup> or both b<sup>2</sup> = b<sup>1</sup> and a<sup>2</sup> > a1.

**Lemma 24 (Asymptotic Growth).** *Let* <sup>b</sup>1, b<sup>2</sup> <sup>∈</sup> <sup>N</sup>≥<sup>1</sup> *and* <sup>a</sup>1, a<sup>2</sup> <sup>∈</sup> <sup>N</sup>*. If* (b2, a2) <sup>&</sup>gt;lex (b1, a1)*, then* <sup>O</sup>(n<sup>a</sup><sup>1</sup> ·b<sup>n</sup> <sup>1</sup> ) <sup>O</sup>(n<sup>a</sup><sup>2</sup> ·b<sup>n</sup> <sup>2</sup> )*. Here,* >lex *is the lexicographic order, i.e.,* (b2, a2) >lex (b1, a1) *iff* b<sup>2</sup> > b<sup>1</sup> *or* b<sup>2</sup> = b<sup>1</sup> ∧ a<sup>2</sup> > a1*.*

*Proof.* By considering the cases b<sup>2</sup> > b<sup>1</sup> and b<sup>2</sup> = b<sup>1</sup> separately, the claim can easily be deduced from the definition of O. 

**Definition 25 (Ordering Coefficients).** Marked coefficients *are of the form* <sup>α</sup>(b,a) *where* <sup>α</sup> <sup>∈</sup> <sup>A</sup>f[x], b <sup>∈</sup> <sup>N</sup>≥<sup>1</sup>*, and* <sup>a</sup> <sup>∈</sup> <sup>N</sup>*. We define* unmark(α(b,a)) = <sup>α</sup> *and* <sup>α</sup>(b2,a2) <sup>2</sup> <sup>α</sup>(b1,a1) <sup>1</sup> *if* (b2, a2) >lex (b1, a1)*. Let*

$$p = \sum\_{j=1}^{\ell} \alpha\_j \cdot n^{a\_j} \cdot b\_j^n \in \mathbb{N} \mathbb{PE}[\overline{x}],$$

*where* α<sup>j</sup> = 0 *for all* 1 ≤ j ≤ *. The marked coefficients of* p *are*

$$\text{coefs}(p) = \begin{cases} \left\{ 0^{(1,0)} \right\}, & \text{if } \ell = 0\\ \left\{ \alpha\_j^{(b\_j, a\_j)} \, \Big|\, 0 \le j \le \ell \right\}, & \text{otherwise.} \end{cases}$$

**Example 26.** *In Example 23 we saw that the loop from Example 2 is nonterminating iff there are* w, x, y, z <sup>∈</sup> <sup>Z</sup>, n<sup>0</sup> <sup>∈</sup> <sup>N</sup> *such that* <sup>p</sup><sup>ϕ</sup> <sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>p</sup><sup>ϕ</sup> <sup>2</sup> > 0 *for all* n>n0*. We get:*

$$\begin{aligned} \text{coeffs}\left(p\_1^{\varphi}\right) &= \left\{ \left(y - \frac{1}{3} + \frac{1}{2} \cdot w\right)^{\langle 4,0\rangle}, 2^{\langle 1,1\rangle}, \left(x - \frac{5}{3}\right)^{\langle 1,0\rangle} \right\}, \\ \text{coeffs}\left(p\_2^{\varphi}\right) &= \left\{ \left(\frac{2}{3} - 2 \cdot y - w\right)^{\langle 4,0\rangle}, 2^{\langle 1,1\rangle}, \left(x - \frac{2}{3}\right)^{\langle 1,0\rangle} \right\} \end{aligned}$$

Now it is easy to see that the asymptotic growth of a normalized polyexponential expression is solely determined by its -maximal addend.

**Corollary 27 (Maximal Addend Determines Asymptotic Growth).** *Let* <sup>p</sup> <sup>∈</sup> NPE *and let* max(coeffs(p)) = <sup>c</sup>(b,a)*. Then* <sup>O</sup>(p) = <sup>O</sup>(<sup>c</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup>)*.*

*Proof.* Clear, as <sup>c</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup> is the asymptotically dominating addend of <sup>p</sup>. 

Note that Corollary 27 would be incorrect for the case c = 0 if we replaced <sup>O</sup>(p) = <sup>O</sup>(<sup>c</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup>) with <sup>O</sup>(p) = <sup>O</sup>(n<sup>a</sup> · <sup>b</sup><sup>n</sup>) as <sup>O</sup>(0) <sup>=</sup> <sup>O</sup>(1). Building upon Corollary 27, we now show that, for large n, the sign of a normalized polyexponential expression is solely determined by its -maximal coefficient. Here, we define sign(c) = <sup>−</sup>1 if <sup>c</sup> <sup>∈</sup> <sup>Q</sup><sup>&</sup>lt;<sup>0</sup> ∪ {−∞}, sign(0) = 0, and sign(c) = 1 if <sup>c</sup> <sup>∈</sup> <sup>Q</sup><sup>&</sup>gt;<sup>0</sup> ∪ {∞}.

**Lemma 28 (Sign of** NPE**s).** *Let* <sup>p</sup> <sup>∈</sup> NPE*. Then* lim<sup>n</sup>→∞ <sup>p</sup> <sup>∈</sup> <sup>Q</sup> *iff* <sup>p</sup> <sup>∈</sup> <sup>Q</sup> *and otherwise,* lim<sup>n</sup>→∞ p ∈ {∞, −∞}*. Moreover, we have*

sign (lim<sup>n</sup>→∞ p) = sign(unmark(max(coeffs(p)))).

*Proof.* If p /<sup>∈</sup> <sup>Q</sup>, then the limit of each addend of <sup>p</sup> is in {−∞,∞} by definition of NPE. As the asymptotically dominating addend determines lim<sup>n</sup>→∞ <sup>p</sup> and unmark(max(coeffs(p))) determines the sign of the asymptotically dominating addend, the claim follows. 

Lemma 29 shows the connection between the limit of a normalized poly-exponential expression p and the question whether p is positive for large enough n. The latter corresponds to the existence of a witness for eventual non-termination by Corollary 21 as ϕ[x/qnorm] is a conjunction of inequalities p > 0 where <sup>p</sup> <sup>∈</sup> NPE[x].

**Lemma 29 (Limits and Positivity of** NPE**s).** *Let* <sup>p</sup> <sup>∈</sup> NPE*. Then*

$$\exists n\_0 \in \mathbb{N}. \,\forall n \in \mathbb{N}\_{>n\_0}. \,\,p>0 \iff \lim\_{n \to \infty} p>0.$$

*Proof.* By case analysis over limn→∞ p.

Now we show that Corollary 21 allows us to decide eventual non-termination by examining the coefficients of normalized poly-exponential expressions. As these coefficients are in Af[x], the required reasoning is decidable.

#### **Lemma 30 (Deciding Eventual Positiveness of** NPE**s).** *Validity of*

$$\exists \overline{c} \in \mathbb{Z}^d, n\_0 \in \mathbb{N}. \ \forall n \in \mathbb{N}\_{>n\_0}. \ \bigwedge\_{i=1}^k p\_i[\overline{x}/\overline{c}] > 0 \tag{20}$$

*where* <sup>p</sup>1,...,p<sup>k</sup> <sup>∈</sup> NPE[x] *is decidable.*

*Proof.* For any <sup>p</sup><sup>i</sup> with 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> and any <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>, we have <sup>p</sup>i[x/c] <sup>∈</sup> NPE. Hence:

$$\begin{array}{l} \exists n\_0 \in \mathbb{N}. \,\forall n \in \mathbb{N}\_{>n\_0}. \,\bigwedge\_{i=1}^k p\_i[\overline{x}/\overline{c}] > 0\\ \Longleftrightarrow \bigwedge\_{i=1}^k \exists n\_0 \in \mathbb{N}. \,\forall n \in \mathbb{N}\_{>n\_0}. \, p\_i[\overline{x}/\overline{c}] > 0\\ \Longleftrightarrow \bigwedge\_{i=1}^k \lim\_{n \to \infty} p\_i[\overline{x}/\overline{c}] > 0\\ \Longleftrightarrow \bigwedge\_{i=1}^k \text{unmark}(\text{max}\_{\succ}(\text{coefs}(p\_i[\overline{x}/\overline{c}]))) > 0 \end{array} \text{(by Lemma 29)}$$

Let <sup>p</sup> <sup>∈</sup> NPE[x] with coeffs(p) = <sup>α</sup>(b1,a1) <sup>1</sup> ,...,α(b-,a-) - where <sup>α</sup>(bi,ai) <sup>i</sup> <sup>α</sup>(b<sup>j</sup> ,a<sup>j</sup> ) j for all 1 <sup>≤</sup> i<j <sup>≤</sup> . If <sup>p</sup>[x/c] = 0 holds, then coeffs(p[x/c]) = {0(1,0)} and thus unmark(max(coeffs(p[x/c]))) = 0. Otherwise, there is an 1 ≤ j ≤ with unmark(max(coeffs(p[x/c]))) = α<sup>j</sup> [x/c] = 0 and we have αi[x/c]=0 for all 1 ≤ i ≤ j − 1. Hence, unmark(max(coeffs(p[x/c]))) > 0 holds iff !- <sup>j</sup>=1 <sup>α</sup><sup>j</sup> [x/c] <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>j</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>α</sup>i[x/c]=0 holds, i.e., iff [x/c] is a model for

$$\text{max.coef.}\text{.pos}(p) = \bigvee\_{j=1}^{\ell} \left( \alpha\_j > 0 \land \bigwedge\_{i=0}^{j-1} \alpha\_i = 0 \right). \tag{21}$$

Hence by the considerations above, (20) is valid iff

$$\exists \overline{\pi} \in \mathbb{Z}^d. \ \bigwedge\_{i=1}^k \text{max.coeff.} \text{pos}(p\_i)[\overline{x}/\overline{\pi}] \tag{22}$$

is valid. By multiplying each (in-)equality in (22) with the least common multiple of all denominators, one obtains a first-order formula over the theory of linear integer arithmetic. It is well known that validity of such formulas is decidable. 

Note that (22) is valid iff <sup>k</sup> <sup>i</sup>=1 max coeff pos(pi) is satisfiable. So to implement our decision procedure, one can use integer programming or SMT solvers to check satisfiability of <sup>k</sup> <sup>i</sup>=1 max coeff pos(pi). Lemma 30 allows us to prove our main theorem.

#### **Theorem 31.** *Termination of triangular loops is decidable.*

*Proof.* By Theorem 8, termination of triangular loops is decidable iff termination of nnt-loops is decidable. For an nnt-loop (1) we obtain a <sup>q</sup>norm <sup>∈</sup> NPE[x] <sup>d</sup> (see Theorem 17 and Corollary 21) such that (1) is non-terminating iff

$$\exists \overline{c} \in \mathbb{Z}^d, n\_0 \in \mathbb{N}. \ \forall n \in \mathbb{N}\_{>n\_0}. \ \varphi[\overline{x}/\overline{q}\_{norm}][\overline{x}/\overline{c}],\tag{20}$$

where <sup>ϕ</sup> is a conjunction of inequalities of the form α > 0, <sup>α</sup> <sup>∈</sup> <sup>A</sup>f[x]. Hence,

$$\left| \varphi[\overline{x}/\overline{q}\_{norm}][\overline{x}/\overline{c}] \right| = \bigwedge\_{i=1}^{k} p\_i[\overline{x}/\overline{c}] > 0$$

where <sup>p</sup>1,...,p<sup>k</sup> <sup>∈</sup> NPE[x]. Thus, by Lemma 30, validity of (20) is decidable. The following algorithm summarizes our decision procedure.


**Example 32.** *In Example 26 we showed that Example 2 is non-terminating iff*

$$\exists w, x, y, z \in \mathbb{Z}, \ n\_0 \in \mathbb{N}. \ \forall n \in \mathbb{N}\_{>n\_0}. \ p\_1^{\varphi} > 0 \land p\_2^{\varphi} > 0$$

*is valid. This is the case iff* max coeff pos(p1) ∧ max coeff pos(p2)*, i.e.,*

$$\begin{cases} y - \frac{1}{3} + \frac{1}{2} \cdot w > 0 \lor 2 > 0 \land y - \frac{1}{3} + \frac{1}{2} \cdot w = 0 \lor x - \frac{5}{3} > 0 \land 2 = 0 \land y - \frac{1}{3} + \frac{1}{2} \cdot w = 0\\ \land \\ \frac{2}{3} - 2 \cdot y - w > 0 \lor 2 > 0 \land \frac{2}{3} - 2 \cdot y - w = 0 \lor x - \frac{2}{3} > 0 \land 2 = 0 \land \frac{2}{3} - 2 \cdot y - w = 0 \end{cases}$$

*is satisfiable. This formula is equivalent to* 6 · y − 2+3 · w = 0 *which does not have any integer solutions. Hence, the loop of Example 2 terminates.*

Example 33 shows that our technique does not yield witnesses for nontermination, but it only proves the existence of a witness for *eventual* nontermination. While such a witness can be transformed into a witness for nontermination by applying the loop several times, it is unclear how often the loop needs to be applied.

**Example 33.** *Consider the following non-terminating loop:*

$$\text{while } x > 0 \text{ do } \begin{bmatrix} x \\ y \end{bmatrix} \leftarrow \begin{bmatrix} x+y \\ 1 \end{bmatrix} \tag{23}$$

*The closed form of* x *is* q = n = 0 · x+n = 0 ·(x+y +n−1)*. Replacing* x *with* qnorm *in* x > 0 *yields* x + y + n − 1 > 0*. The maximal marked coefficient of* <sup>x</sup>+y+n−<sup>1</sup> *is* <sup>1</sup>(1,1)*. So by Algorithm 4,* (23) *does not terminate if* <sup>∃</sup>x, y <sup>∈</sup> <sup>Z</sup>. <sup>1</sup> <sup>&</sup>gt; <sup>0</sup> *is valid. While* 1 > 0 *is a tautology,* (23) *terminates if* x ≤ 0 *or* x ≤ −y*.*

However, the final formula constructed by Algorithm 4 precisely describes all witnesses for eventual non-termination.

**Lemma 34 (Witnessing Eventual Non-Termination).** *Let (1) be a triangular loop, let* qnorm *be the normalized closed form of (2), and let*

$$(\varphi \wedge \varphi[\overline{x}/A\,\overline{x} + \overline{a}])\left[\overline{x}/\overline{q}\_{norm}\right] = \bigwedge\_{i=1}^{k} p\_i > 0.1$$

*Then* <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup> *witnesses eventual non-termination of (1) iff* [x/c] *is a model for*

$$\bigwedge\_{i=1}^{k} \text{max.coeff.pos}(p\_i).$$

#### **5 Conclusion**

We presented a decision procedure for termination of affine integer loops with triangular update matrices. In this way, we contribute to the ongoing challenge of proving the 15 years old conjecture by Tiwari [15] that termination of affine integer loops is decidable. After linear loops [4], loops with at most 4 variables [14], and loops with diagonalizable update matrices [3,14], triangular loops are the fourth important special case where decidability could be proven.

The key idea of our decision procedure is to compute *closed forms* for the values of the program variables after a symbolic number of iterations n. While these closed forms are rather complex, it turns out that reasoning about firstorder formulas over the theory of linear integer arithmetic suffices to analyze their behavior for large n. This allows us to reduce (non-)termination of triangular loops to integer programming. In future work, we plan to investigate generalizations of our approach to other classes of integer loops.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **AliveInLean: A Verified LLVM Peephole Optimization Verifier**

Juneyoung Lee1(B), Chung-Kil Hur<sup>1</sup>, and Nuno P. Lopes<sup>2</sup>

<sup>1</sup> Seoul National University, Seoul, Republic of Korea juneyoung.lee@sf.snu.ac.kr <sup>2</sup> Microsoft Research, Cambridge, UK

**Abstract.** Ensuring that compiler optimizations are correct is important for the reliability of the entire software ecosystem, since all software is compiled. Alive [12] is a tool for verifying LLVM's peephole optimizations. Since Alive was released, it has helped compiler developers proactively find dozens of bugs in LLVM, avoiding potentially hazardous miscompilations. Despite having verified many LLVM optimizations so far, Alive is itself not verified, which has led to at least once declaring an optimization correct when it was not.

We introduce AliveInLean, a formally verified peephole optimization verifier for LLVM. As the name suggests, AliveInLean is a reengineered version of Alive developed in the Lean theorem prover [14]. Assuming that the proof obligations are correctly discharged by an SMT solver, AliveInLean gives the same level of correctness guarantees as state-ofthe-art formal frameworks such as CompCert [11], Peek [15], and Vellvm [26], while inheriting the advantages of Alive (significantly more automation and easy adoption by compiler developers).

**Keywords:** Compiler verification · Peephole optimization · LLVM · Lean · Alive

#### **1 Introduction**

Verifying compiler optimizations is important to ensure reliability of the software ecosystem. Various frameworks have been proposed to verify optimizations of industrial compilers. Among them, Alive [12] is a tool for verifying peephole optimizations of LLVM that has been successfully adopted by compiler developers. Since it was released, Alive has helped developers find dozens of bugs.

Figure 1 shows the structure of Alive. An optimization pattern of interest written in a domain-specific language is given as input. Alive parses the input, and encodes the behavior of the source and target programs into logic formulas in the theory of quantified bit-vectors and arrays. Finally, several proof obligations are created from the encoded behavior, and then checked by an SMT solver.

Alive relies on the following three-fold trust base. Firstly, the semantics of LLVM's intermediate representation and SMT expressions. Secondly, Alive's verification condition generator. Finally, the SMT solver used to discharge proof

**Fig. 1.** The structure of Alive and AliveInLean

obligations. None of these are formally verified, and thus an error in any of these may result in an incorrect answer.

To address this problem, we introduce AliveInLean, a formally verified peephole optimization verifier for LLVM. AliveInLean is written in Lean [14], an interactive theorem proving language. Its semantics of LLVM IR (Intermediate Representation) and SMT expressions are rigorously tested using Lean's metaprogramming language [5] and system library. AliveInLean's verification condition generator is formally verified in Lean.

Using AliveInLean requires less human effort than directly proving the optimizations on formal frameworks thanks to automation given by SMT solvers. For example, verifying the correctness of a peephole optimization on a formal framework requires more than a hundred lines of proofs [15]. However, the correctness of AliveInLean relies on the correctness of the used SMT solver. To counteract the dependency on SMT solvers, proof obligations can be cross-checked with multiple SMT solvers. Moreover, there is substantial work towards making SMT solvers generate proof certificates [2,3,6,7].

AliveInLean is a proof of concept. It currently does not support all operations that Alive does like, e.g., memory-related operations. However, AliveInLean supports all integer peephole optimizations, which is already useful in practice as most bugs found by Alive were in integer optimizations [12].

#### **2 Overview**

We give an overview of AliveInLean's features from a user's perspective.

**Verifying Optimizations.** AliveInLean reads optimization(s) from a file and checks their correctness. A user writes an optimization of interest in a DSL with similar syntax to that of LLVM IR:

```
Name: AddSub:1309
%lhs = and i4 %a, %b
%rhs = or i4 %a, %b
%r = add i4 %lhs, %rhs
  =>
%r = add i4 %a, %b
```
This example transformation corresponds to rewriting (%a & %b) + (%a | %b) to %a + %b, given 4-bits integers %a and %b. The last variable %r, or *root* variable, is assumed to be the return value of the programs. AliveInLean encodes the behavior of each program and generates verification conditions (VCs). Finally, AliveInLean calls Z3 to discharge the VCs.

**Proving Useful Properties.** AliveInLean can be used as a formal framework to prove lemmas using interactive theorem proving. This is helpful when a user wants to show a property of a program which is hard to represent as a transformation.

For example, one may want to prove that the divisor of udiv (unsigned division) is never poison<sup>1</sup> if it did not raise undefined behavior (UB). The lemma below states this in Lean. This lemma says that the divisor val is never poison if the state st' after executing the udiv instruction (step) has no UB.

```
lemma never_poison:
  forall .. (HSTEP: some st' = step st (udiv isz name op1 op2))
            (HNOUB: not (has_ub st'))
            (HVAL: some val = get_value st op2 (ty.int isz)),
    not (is_poison val)
```
**Testing Specifications.** AliveInLean supports random testing of AliveInLean's specifications (for which no verification is possible). For example, the step function in the above example implements a specification of the LLVM IR, and it can be tested with respect to the behavior of the LLVM compiler. Another trustbase is the specification of SMT expressions, which defines a relation between expressions (with no free variable) and their corresponding concrete values.

These tests help build confidence in the validity of VC generation. Running tests is helpful when a user wants to use a different version of LLVM or modify AliveInLean's specifications (e.g., adding a new instruction to IR).

#### **3 Verifying Optimizations**

In this section we introduce the different components of AliveInLean that work together to verify an optimization.

<sup>1</sup> poison is a special value of LLVM representing a result of an erroneous computation.

#### **3.1 Semantics Encoder**

Given a program and an initial state, the semantics encoder produces the final state of the program as a set of SMT expressions. The IR interpreter is similar, but works over concrete values rather than symbolic ones. The semantics encoder and the IR interpreter share the same codebase (essentially the LLVM IR semantics). The code is parametric on the type of the program state. For example, the type of undefined behavior can be either initialized as the bool type of Lean or the Bool SMT expression type. Given the type, Lean can automatically resolve which operations to use to update the state using typeclass resolution.

#### **3.2 Refinement Encoder**

Given a source program, a transformed program, and an initial state, the refinement encoder emits an SMT expression that encodes the refinement check between the final states of the two programs. To obtain the final states, the semantics encoder is used.

The refinement check proves that (1) the transformed program only triggers UB when the original program does (i.e., UB can only be removed), (2) the root variable of the transformed program is only poison when it is also poison in the original program, and (3) variables' values in the final states of the two programs are the same when no UB is triggered and the original value is not poison.

#### **3.3 Parser and Z3 Backend**

The parser for Alive's DSL is implemented using Lean's parser monad and file I/O library. SMT expressions are processed with Z3 using Lean's SMT interface.

#### **4 Correctness of AliveInLean**

We describe how the correctness of AliveInLean is proved. First, we explain the correctness proof of the semantics encoder and the refinement encoder. We show that if the SMT expression encoded by refinement encoder is valid, the optimization is indeed correct. Next, we explain how the trust-base is tested.

#### **4.1 Semantics Encoding**

Given an IR interpreter run, a semantics encoder encoder is correct with respect to run if for any IR program and input state, the final program state generated by run and the symbolic state encoded by encoder are equivalent.

To formally define its correctness, an equivalence relation between SMT expressions and concrete values is defined. We say that an SMT expression e and a Lean value ν are equivalent, or e ∼ ν, if e has no free variables and it evaluates to ν. The equivalence relation is inductively defined with respect to the structure of an SMT expression. To deal with free variables, an environment η is defined, which is a set of pairs (x, ν) where x is a variable and ν is a concrete value. η[[e]] is an expression with all free variables x replaced with ν if (x, ν) ∈ η.

Next, we define a program state. A state s is defined as (u, r) where u is an undefined behavior flag and r is a register file. r is a list of (x, v) where x is a variable and v is a value. v is defined as (sz, i, p) where sz is its size in bits, i is an integer value, and p is a poison flag.

There are two kinds of states: a symbolic state, and a concrete state. A symbolic state s*<sup>s</sup>* is a state whose u, i, p are SMT expressions. A concrete state s*<sup>c</sup>* is a state whose all attributes are concrete values. We say that s*<sup>s</sup>* and s*<sup>c</sup>* are equivalent, or s*<sup>s</sup>* ∼ s*c*, if s*<sup>s</sup>* has no free variable in its attributes and they are equivalent. η[[s*s*]] is a symbolic state with the environment η applied to u, i, p.

Now, the correctness of encoder with respect to run is defined as follows. It states that the result of encoder is equivalent to the result of run.

**Theorem 1.** *For all initial states* s*s,* s*c, program* p*, and environment* η *s.t.* <sup>η</sup>[[s*s*]] <sup>∼</sup> <sup>s</sup>*c, we have that* <sup>η</sup>[[*encoder*(p, s*s*)]] <sup>∼</sup> *run*(p, s*c*)*.*

#### **4.2 Refinement Encoding**

Function check(p*src*, p*tgt*, s*s*) generates an SMT expression that encodes refinement between the source and target programs, respectively, p*src* and p*tgt*.

We first define refinement between two concrete states. As Alive does, AliveInLean only checks the value of the root variable of a program. Given a root variable r, a concrete state s *<sup>c</sup>* refines s*c*, or s *<sup>c</sup>* s*c*, if (1) s*<sup>c</sup>* has undefined behavior, or (2) both s*<sup>c</sup>* and s *<sup>c</sup>* have values assigned to r, say v and v , and v = poison ∨ v = v. A target program p*tgt* refines program p*src* if run(p*tgt*, s*c*) run(p*src*, s*c*) holds for any initial concrete state s*c*,.

The correctness of check is stated as follows.

**Theorem 2.** *Given an initial symbolic state* <sup>s</sup>*s, if* <sup>η</sup>0[[*check*(p*src*, p*tgt*, s*s*)]] <sup>∼</sup> true *for any* η0*, then for any environment* η *and initial state* s*<sup>c</sup> s.t.* η[[s*s*]] ∼ s*c, we have that run*(p*tgt*, s*c*) *run*(p*src*, s*c*)*.*

This theorem says that if the returned expression of check evaluates to true in any environment, program p*tgt* refines program p*src*.

#### **4.3 Validity of Trust-Base**

**Testing Specification of SMT Expressions.** Specifications of SMT expressions are traversed using Lean's metaprogramming language and tested. The testing we have done is different from QuickChick [4] because QuickChick evaluates expressions in Coq. The approach cannot be used here because SMT expressions need to be evaluated in an SMT solver (e.g., Z3). Example spec:

```
forall {sz : size} (s1 s2 : sbitvec sz) (b1 b2 : bitvector sz),
  bv_equiv s1 b1 -> bv_equiv s2 b2 ->
    bv_equiv (sbitvec.add s1 s2) (bitvector.add b1 b2)
```
This spec says that if SMT expressions s1, s2 of a bit-vector type (sbitvec) are equivalent to two concrete bit-vector values b1, b2 in Lean (bitvector), an add expression of s1, s2 is equivalent to the result of adding b1 and b2. Function bitvector.add must be called in Lean, so its operands (b1, b2) are assigned random values in Lean. sbitvec.add is translated to SMT's bvadd expression, and s1 and s2 are initialized as BitVec variables in an SMT solver. The testing function generates an SMT expression with random inputs like the following:

(assert (forall ((s1 (\_ BitVec 4))) (forall ((s2 (\_ BitVec 4))) (=> (= s1 #xA) (=> (= s2 #x2) (= (bvadd s1 s2) #xC))))))

The size of bitvector (sz) is initialized to 4, and b1, b2 were randomly initialized to 10 (#xA) and 2 (#x2). A specification is incorrect if the generated SMT expression is not valid.

**Testing Specification of LLVM IR.** Specification of LLVM IR is tested using randomly generated IR programs. IR programs of 5–10 randomly chosen instructions are generated, compiled with LLVM, and ran. The result of the execution of the program is compared with the result of AliveInLean's IR interpreter.

#### **5 Evaluation**

For the evaluation, we used a computer with an Intel Core i5-6600 CPU and 8 GB of RAM, and Z3 [13] for SMT solving. To test whether AliveInLean and Alive give the same result, we used all of the 150 integer optimizations from Alive's test suite that are supported by AliveInLean. No mismatches were observed.

To test the SMT specification, we randomly generated 10,000 tests for each of the operations (18 bit-vector and 15 boolean). This test took 3 CPU hours.

The LLVM IR specification was tested by running 1,000,000 random IR programs in our interpreter and comparing the output with that of LLVM. This comparison needs to take into account that some programs may trigger UB or yield a poison value, which gives freedom to LLVM to produce a variety of results. These tests took 10 CPU hours overall. Four admitted arithmetic lemmas were tested as well. As a side-effect of the testing, we found several miscompilation bugs in LLVM.<sup>2</sup>

AliveInLean<sup>3</sup> consists of 11.9K lines of code. The optimization verifier consists of 2.2K LoC, the specification tester is 1.5K, and the proof has 8.1K lines. It took 3 person-months to implement the tool and prove its correctness.

#### **6 Related Work**

We introduce previous work on compiler verification and validation and compare it with AliveInLean. Also, we give an overview on previous work on semantics of compiler intermediate representations (IRs).

<sup>2</sup> https://llvm.org/PR40657.

<sup>3</sup> https://github.com/Microsoft/AliveInLean.

#### **6.1 Compiler Verification**

**Proving Correctness on Formal Semantics.** The correctness of compilation can be proved on a formal semantics of a language that is written in a theorem proving language such as Coq. Vellvm [26] is a Coq formalization of the semantics of LLVM IR. CompCert [11] is a verified C compiler written in Coq, and its compilation to assembly languages including x86, PowerPC is proved correct.

However, it is hard to apply this approach to existing industrial compilers because proving correctness of optimizations requires non-trivial effort. Peek [15] is a framework for implementing and verifying peephole optimizations for x86 on CompCert. They implemented 28 peephole optimizations which required 3.3k lines of code and 6.6k lines of proofs (∼350 LoC each). Even if this is small compared to the size of CompCert, the burden is non-trivial considering that LLVM has more than 1,000 peephole optimizations [12].

Another problem with this approach is that changing the semantics requires modification of the proof. The semantics of poison and undef value of LLVM is currently not consistent and thus it triggers miscompilations of some programs [10]. Therefore, compiler developers regularly test various undef semantics with existing optimizations, which would be a non-trivial task if correctness proofs had to be manually updated.

**Translation Validation and Credible Compilation.** In translation validation [18], a pair of an original program and an optimized program is given to a validation tool at compile time to check the correctness of the optimization. Several such tools exist for LLVM [20,22,25]. Translation validation is, however, slow, and it cannot tell whether an optimization is correct in general. Consider this optimization:

$$\begin{array}{rcl} \mathbf{z} &=& \mathbf{0} - \mathbf{(x \ / \ \mathbf{C})} \\ \implies \\ \mathbf{z} &=& \mathbf{x \ / \ \mathbf{-C}} \end{array}$$

If C is a constant, -C can be computed at compile time. However, this optimization is wrong only if C is INT MIN. To show that compilation is fully correct, translation validation would need to be run for every combination of inputs.

Credible compilation [19], or witnessing compiler [16,17], is an approach to improve translation validation by accepting witnesses generated by a compiler. Crellvm [8] is a credible compilation framework for LLVM. It requires modifications to the compiler, which makes it harder to apply and maintain.

#### **6.2 Solver-Aided Programming Languages**

Proving correctness of optimizations can be represented as a search problem that finds a counter-example for the optimization. Tools like Z3, CVC4 can be used to solve the search problem. Translation of a high-level search problem to the external solver's input has been considered bug-prone, and frameworks like Rosette [21] and Smten [23] address this issue by providing higher-level languages for describing the search problem. SpaceSearch [24] helps programmers prove the correctness of the description by supporting Coq and Rosette backends from a single specification. AliveInLean provides a stronger guarantee of correctness because translation to SMT expressions is also written in Lean, leaving Lean as the sole trust-base.

#### **6.3 Semantics of Compiler IR**

Correctly encoding semantics of compiler IR is important for the validity of a tool. LLVM IR is an SSA-based intermediate representation which is used to represent a program being compiled. LLVM LangRef [1] has an informal definition of the LLVM IR, but there are a few known problems. [10] shows that the semantics of poison and undef values are inconsistent. [9] shows that the semantics of pointer↔integer casting is inconsistent. AliveInLean supports poison but not undef, following the suggestion from [10]. AliveInLean does not support memory-related operations including load, store, and pointer ↔ integer casting.

#### **7 Discussion**

AliveInLean has several limitations. As discussed before, AliveInLean does not support memory operations. Correctly encoding the memory model of LLVM IR is challenging because the memory model of LLVM IR is more complex than either a byte array or a set of memory objects [9]. Supporting branch instructions and floating point would help developers prove interesting optimizations. Supporting branches is a challenging job especially when loops are involved.

Maintainability of AliveInLean highly relies on one's proficiency in Lean. Changing the semantics of an IR instruction breaks the proof, and updating it requires proficiency in Lean. However, we believe that only relevant parts in the proof need to be updated as the proof is modularized.

Alive has features that are absent in AliveInLean. Alive supports defining a precondition for an optimization, inferring types of variables if not given, and showing counter-examples if the optimization is wrong. We leave this as future work.

#### **8 Conclusion**

AliveInLean is a formally verified compiler optimization verifier. Its verification condition generator is formally verified with a machine-checked proof. Using AliveInLean, developers can easily check the correctness of compiler optimizations with high reliability. Also, they can use AliveInLean as a formal framework like Vellvm to prove properties of interest in limited cases. The extensive random testing did not find problems in the trust base, increasing its trustworthiness. Moreover, as a side-effect of the IR semantics testing, we found several bugs in LLVM.

**Acknowledgments.** The authors thank Leonardo de Moura and Sebastian Ullrich for their help with Lean. This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2017R1A2B2007512). The first author was supported by a Korea Foundation for Advanced Studies scholarship.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Concurrency

## **Automated Parameterized Verification of CRDTs**

Kartik Nagar(B) and Suresh Jagannathan

Purdue University, West Lafayette, USA {nagark,suresh}@cs.purdue.edu

**Abstract.** Maintaining multiple replicas of data is crucial to achieving scalability, availability and low latency in distributed applications. *Conflict-free Replicated Data Types* (CRDTs) are important building blocks in this domain because they are designed to operate correctly under the myriad behaviors possible in a weakly-consistent distributed setting. Because of the possibility of concurrent updates to the same object at different replicas, and the absence of any ordering guarantees on these updates, *convergence* is an important correctness criterion for CRDTs. This property asserts that two replicas which receive the same set of updates (in any order) must nonetheless converge to the same state. One way to prove that operations on a CRDT converge is to show that they commute since commutative actions by definition behave the same regardless of the order in which they execute. In this paper, we present a framework for automatically verifying convergence of CRDTs under different weak-consistency policies. Surprisingly, depending upon the consistency policy supported by the underlying system, we show that not all operations of a CRDT need to commute to achieve convergence. We develop a proof rule parameterized by a consistency specification based on the concepts of *commutativity modulo consistency policy* and *non-interference to commutativity*. We describe the design and implementation of a verification engine equipped with this rule and show how it can be used to provide the first automated convergence proofs for a number of challenging CRDTs, including sets, lists, and graphs.

#### **1 Introduction**

For distributed applications, keeping a single copy of data at one location or multiple fully-synchronized copies (i.e. state-machine replication) at different locations, makes the application susceptible to loss of availability due to network and machine failures. On the other hand, having multiple un-synchronized replicas of the data results in high availability, fault tolerance and uniform low latency, albeit at the expense of consistency. In the latter case, an update issued at one replica can be asynchronously transmitted to other replicas, allowing the system to operate continuously even in the presence of network or node failures [8]. However, mechanisms must now be provided to ensure replicas are kept consistent with each other in the face of concurrent updates and arbitrary re-ordering of such updates by the underlying network.

Over the last few years, *Conflict-free Replicated Datatypes* (CRDTs) [19–21] have emerged as a popular solution to this problem. In op-based CRDTs, when an operation on a CRDT instance is issued at a replica, an *effector* (basically an update function) is generated locally, which is then asynchronously transmitted (and applied) at all other replicas.<sup>1</sup> Over the years, a number of CRDTs have been developed for common datatypes such as maps, sets, lists, graphs, etc.

The primary correctness criterion for a CRDT implementation is *convergence* (sometimes called *strong eventual consistency* [9,20] (SEC)): two replicas which have received the same set of effectors must converge to the same CRDT state. Because of the weak default guarantees assumed to be provided by the underlying network, however, we must consider the possibility that effectors can be applied in arbitrary order on different replicas, complicating correctness arguments. This complexity is further exacerbated because CRDTs impose no limitations on how often they are invoked, and may assume additional properties on network behaviour [14] that must be taken into account when formulating correctness arguments.

Given these complexities, verifying convergence of operations in a replicated setting has proven to be challenging and error-prone [9]. In response, several recent efforts have used mechanized proof assistants to yield formal machinechecked proofs of correctness [9,24]. While mechanization clearly offers stronger assurance guarantees than handwritten proofs, it still demands substantial manual proof engineering effort to be successful. In particular, correctness arguments are typically given in terms of constraints on CRDT states that must be satisfied by the underlying network model responsible for delivering updates performed by other replicas. Relating the state of a CRDT at one replica with the visibility properties allowed by the underlying network has typically involved constructing an intricate simulation argument or crafting a suitably precise invariant to establish convergence. This level of sophisticated reasoning is required for every CRDT and consistency model under consideration. There is a notable lack of techniques capable of reasoning about CRDT correctness under different weak consistency policies, even though such techniques exist for other correctness criteria such as preservation of state invariants [10,11] or serializability [4,16] under weak consistency.

To overcome these challenges, we propose a novel *automated* verification strategy that does not require complex proof-engineering of handcrafted simulation arguments or invariants. Instead, our methodology allows us to directly connect constraints on events imposed by the consistency model with constraints on states required to prove convergence. Consistency model constraints are extracted from an axiomatization of network behavior, while state constraints are generated using reasoning principles that determine the *commutativity* and *non-interference* of sequences of effectors, subject to these consistency constraints. Both sets of constraints can be solved using off-the-shelf theorem

<sup>1</sup> In this work, we focus on the op-based CRDT model; however, our technique naturally extends to state-based CRDTs since they can be emulated by an op-based model [20].

provers. Because an important advantage of our approach is that it is parametric on weak consistency schemes, we are able to analyze the problem of CRDT convergence under widely different consistency policies (e.g., eventual consistency, causal consistency, parallel snapshot isolation (PSI) [23], among others), and for the first time verify CRDT convergence under such stronger models (efficient implementations of which are supported by real-world data stores). A further pleasant by-product of our approach is a pathway to take advantage of such stronger models to simplify existing CRDT designs and allow composition of CRDTs to yield new instantiations for more complex datatypes.

The paper makes the following contributions:


Collectively, these contributions yield (to the best of our knowledge) the first automated and parameterized proof methodology for CRDT verification.

The remainder of the paper is organized as follows. In the next section, we provide further motivation and intuition for our approach. Section 3 formalizes the problem definition, providing an operational semantics and axiomatizations of well-known consistency specifications. Section 4 describes our proof strategy for determining CRDT convergence that is amenable to automated verification. Section 5 provides details about our implementation and experimental results justifying the effectiveness of our framework. Section 6 presents related work and conclusions.

#### **2 Illustrative Example**

<sup>S</sup><sup>∈</sup> IP(E) Add(a):S <sup>λ</sup>S'.S'∪{a} Remove(a):S <sup>λ</sup>S'.S'\{a} Lookup(a):S a <sup>∈</sup> <sup>S</sup>

**Fig. 1.** A simple Set CRDT definition.

We illustrate our approach using a Set CRDT specification as a running example. A CRDT (Σ, O, σinit) is characterized by a set of states Σ, a set of operations O and an initial state σinit ∈ Σ, where each operation o ∈ O is a function with signature Σ → (Σ → Σ). The state of a CRDT is replicated, and when operation o is issued at a replica with state σ, the effector o(σ)

is generated, which is immediately applied at the local replica (which we also call the *source* replica) and transmitted to all other replicas, where it is subsequently applied upon receipt.

Additional constraints on the order in which effectors can be received and applied at different replicas are specified by a consistency policy, discussed below. In the absence of any such additional constraints, however, we assume the underlying network only offers *eventually consistent* guarantees - all replicas eventually receive all effectors generated by all other replicas, with no constraints on the order in which these effectors are received.

Consider the simple Set CRDT definition shown in Fig. 1. Let E be an arbitrary set of elements. The state space Σ is IP(E). Add(a):S denotes the operation Add(a) applied on a replica with state S, which generates an effector which simply adds a to the state of all other replicas it is applied to. Similarly, Remove(a):S generates an effector that removes a on all replicas to which it is applied. Lookup(a):S is a query operation which checks whether the queried element is present in the source replica S.

A CRDT is *convergent* if during any execution, any two replicas which have received the same set of effectors have the same state. Our strategy to prove convergence is to show that any two effectors of the CRDT pairwise commute with each other modulo a consistency policy, i.e. for two effectors e<sup>1</sup> and e2, e<sup>1</sup> ◦ e<sup>2</sup> = e<sup>2</sup> ◦ e1. Our simple Set CRDT clearly does not converge when executed on an eventually consistent data store since the effectors e<sup>1</sup> = Add(a):S<sup>1</sup> and e<sup>2</sup> = Remove(a):S<sup>2</sup> do not commute, and the semantics of eventual consistency imposes no additional constraints on the visibility or ordering of these operations that could be used to guarantee convergence. For example, if e<sup>1</sup> is applied to the state at some replica followed by the application of e2, the resulting state does not include the element a; conversely, applying e<sup>2</sup> to a state at some replica followed by e<sup>1</sup> leads to a state that does contain the element a.

However, while commutativity is a sufficient property to show convergence, it is not always a necessary one. In particular, different consistency models impose different constraints on the visibility and ordering of effectors that can obviate the need to reason about their commutativity. For example, if the consistency model enforces Add(a) and Remove(a) effectors to be applied in the same order at all replicas, then the Set CRDT will converge. As we will demonstrate later, the PSI consistency model

$$\begin{aligned} & \mathbf{S} \in \operatorname{IP}(E \times I) \\ & \mathbf{Add} \left(\mathbf{a}, \mathbf{i}\right) : \mathbf{S} \\ & \lambda \mathbf{S} \nmid . \mathbf{S} \nmid \{ \mathbf{a}, \mathbf{i} \} \\ & \mathbf{Rem} \mathbf{o} \mathbf{e} \left(\mathbf{a}\right) : \mathbf{S} \\ & \lambda \mathbf{S} \nmid . \mathbf{S} \nmid \{ \mathbf{a}, \mathbf{i} \} : \{ \mathbf{a}, \mathbf{i} \} \in \mathbf{S} \\ & \mathbf{Do} \mathbf{u} \mathbf{p} \left(\mathbf{a}\right) : \mathbf{S} \\ & \exists \left(\mathbf{a}, \mathbf{i}\right) \in \mathbf{A} \end{aligned}$$

**Fig. 2.** A definition of an ORSet CRDT.

exactly matches this requirement. To further illustrate this, consider the definition of the ORSet CRDT shown in Fig. 2. Here, every element is tagged with a unique identifier (coming from the set I). Add(a,i):S simply adds the element a tagged with i<sup>2</sup>, while Remove(a):S returns an effector that when applied to a replica state will remove all tagged versions of a that were present in S, the source replica.

<sup>2</sup> Assume that every call to Add uses a unique identifier, which can be easily arranged, for example by keeping a local counter at every replica which is incremented at every operation invocation, and using the id of the replica and the value of the counter as a unique identifier.

Suppose e<sup>1</sup> =Add(a,i):S<sup>1</sup> and e<sup>2</sup> =Remove(a):S2. If it is the case that S<sup>2</sup> does not contain (a,i), then these two effectors are guaranteed to commute because e<sup>2</sup> is unaware of (a,i) and thus behaves as a no-op with respect to effector e<sup>1</sup> when it is applied to any replica state. Suppose, however, that e1's effect was visible to e2; in other words, e<sup>1</sup> is applied to S<sup>2</sup> before e<sup>2</sup> is generated. There are two possible scenarios that must be considered. (1) Another replica (call it S') has e<sup>2</sup> applied before e1. Its final state reflects the effect of the Add operation, while S2's final state reflects the effect of applying the Remove; clearly, convergence is violated in this case. (2) All replicas apply e<sup>1</sup> and e<sup>2</sup> in the same order; the interesting case here is when the effect of e<sup>1</sup> is always applied before e<sup>2</sup> on every replica. The constraint that induces an effector order between e<sup>1</sup> and e<sup>2</sup> on every replica as a consequence of e1's visibility to e<sup>2</sup> on S<sup>2</sup> is supported by a causally consistent distributed storage model. Under causal consistency, whenever e<sup>2</sup> is applied to a replica state, we are guaranteed that e1's effect, which adds (a,i) to the state, would have occurred. Thus, even though e<sup>1</sup> and

<sup>S</sup><sup>∈</sup> IP(<sup>E</sup> <sup>×</sup> <sup>I</sup>) <sup>×</sup> IP(<sup>E</sup> <sup>×</sup> <sup>I</sup>) Add(a,i):(A,R) <sup>λ</sup>(A',R').(A'∪{(a,i)},R') Remove(a):(A,R) <sup>λ</sup>(A',R').(A',R'∪{(a,i):(a,i)∈A } Lookup(a):(A,R) <sup>∃</sup>(a,i)∈A∧(a,i)∈/<sup>R</sup>

**Fig. 3.** A variant of the ORSet using tombstones.

e<sup>2</sup> do not commute when applied to an arbitrary state, their execution under causal consistency nonetheless allows us to show that all replica states converge. The essence of our proof methodology is therefore to reason about *commutativity modulo consistency* - it is only for those CRDT operations unaffected by the constraints imposed by the consistency model that proving commutativity is required. Consistency

properties that affect the visibility of effectors are instead used to guide and simplify our analysis. Applying this notion to pairs of effectors in arbitrarily long executions requires incorporating commutativity properties under a more general induction principle to allow us to generalize the commutativity of effectors in bounded executions to the unbounded case. This generalization forms the heart of our automated verification strategy.

Figure 3 defines an ORSet with "tombstone" markers used to keep track of deleted elements in a separate set. Our proof methodology is sufficient to automatically show that this CRDT converges under EC.

#### **3 Problem Definition**

In this section, we formalize the problem of determining convergence in CRDTs parametric to a weak consistency policy. First, we define a general operational semantics to describe all valid executions of a CRDT under any given weak consistency policy. As stated earlier, a CRDT program P is specified by the tuple (Σ, O, σinit). Here, we find it to convenient to define an operation o ∈ O as a function (Σ × (Σ → Σ)∗) → (Σ → Σ). Instead of directly taking as input a generating state, operations are now defined to take as input a start state and a sequence of effectors. The intended semantics is that the sequence of effectors would be applied to the start state to obtain the generating state. Using this syntax allows us simplify the presentation of the proof methodology in the next section, since we can abstract a history of effectors into an equivalent start state.

Formally, if ˆo : Σ → (Σ → Σ) was the original op-based definition, then we define the operation o : (Σ × (Σ → Σ)∗) → (Σ → Σ) as follows:

$$\begin{aligned} \forall \sigma. \quad & o(\sigma, \epsilon) = \hat{o}(\sigma) \\ \forall \sigma, \pi, f. \quad & o(\sigma, \pi f) = o(f(\sigma), \pi) \end{aligned}$$

Note that indicates the empty sequence. Hence, for all states σ and sequence of functions π, we have o(σ, π)=ˆo(π(σ)).

To define the operational semantics, we abstract away from the concept of replicas, and instead maintain a global pool of effectors. A new CRDT operation is executed against a CRDT state obtained by first selecting a subset of effectors from the global pool and then applying the elements in that set in some non-deterministically chosen permutation to the initial CRDT state. The choice of effectors and their permutation must obey the weak consistency policy specification. Given a CRDT P = (Σ, O, σinit) and a weak consistency policy Ψ, we define a **labeled transition system** SP,Ψ = (C,→), where C is a set of configurations and → is the transition relation. A **configuration** c = (Δ, vis, eo) consists of three components: Δ is a set of events, vis ⊆ Δ × Δ is a *visibility* relation, and eo ⊆ Δ × Δ is a global *effector order* relation (constrained to be anti-symmetric). An **event** η ∈ Δ is a tuple (eid, o, σs, Δr, eo) where eid is a unique event id, o ∈ O is a CRDT operation, σ<sup>s</sup> ∈ Σ is the start CRDT state, Δ<sup>r</sup> is the set of events visible to η (also called the history of η), and eo is a total order on the events in Δ<sup>r</sup> (also called the local effector order relation). We assume projection functions for each component of an event (for example σs(η) projects the start state of the event η).

Given an event η = (eid, o, σs, Δr, eo), we define η<sup>e</sup> to be the **effector** associated with the event. This effector is obtained by executing the CRDT operation o against the start CRDT state σ<sup>s</sup> and the sequence of effectors obtained from the events in Δ<sup>r</sup> arranged in the reverse order of eo. Formally,

$$\eta^{c} = \begin{cases} o(\sigma\_{s}, \epsilon) & \text{if } \Delta\_{r} = \phi \\ o(\sigma\_{s}, \prod\_{i=1}^{k} \eta\_{P(i)}^{c}) & \text{if } \Delta\_{r} = \{\eta\_{1}, \dots, \eta\_{k}\} \text{ where } P: \{1, \dots, k\} \to \{1, \dots, k\} \\ & \forall i, j, i < j \Rightarrow (\eta\_{P(j)}, \eta\_{P(i)}) \in \mathbf{e} \mathbf{o} \end{cases} \tag{1}$$

In the above definition, when Δ<sup>r</sup> is non-empty, we define a permutation P of the events in Δ<sup>r</sup> such that the permutation order is the inverse of the effector order eo. This ensures that if (ηi, η<sup>j</sup> ) <sup>∈</sup> eo, then <sup>η</sup><sup>e</sup> <sup>j</sup> occurs before η<sup>e</sup> <sup>i</sup> in the sequence passed to the CRDT operation o, effectively applying η<sup>e</sup> <sup>i</sup> before η<sup>e</sup> <sup>j</sup> to obtain the generating state for o.

The following rule describes the transitions allowed in SP,Ψ :

$$\begin{array}{c} \Delta\_{r} \subseteq \Delta \quad o \in O \quad \sigma\_{s} \in \Sigma \quad \mathsf{e}\mathsf{o}\_{r} \text{ is a total order on } \Delta\_{r} \\ \mathsf{e}\mathsf{o} \subseteq \mathsf{e}\mathsf{o}\_{r} \quad \mathsf{f}\mathsf{resh} \text{ id} \quad \eta = (\mathsf{id}, o, \sigma\_{s}, \Delta\_{r}, \mathsf{e}\mathsf{o}) \\ \Delta' = \Delta \cup \{\eta\} \quad \mathsf{vis}' = \mathsf{vis} \cup \{ (\eta', \eta) \mid \eta' \in \Delta\_{r} \} \quad \Psi(\Delta', \mathsf{vis}', \mathsf{e}\mathsf{o}') \\ \hline \hline (\Delta, \mathsf{vis}, \mathsf{e}\mathsf{o}) \xrightarrow{\eta} (\Delta', \mathsf{vis}', \mathsf{e}\mathsf{o}') \end{array}$$

The rule describes the effect of executing a new operation o, which begins by first selecting a subset of already completed events (Δr) and a total order eo<sup>r</sup> on these events which obeys the global effector order eo. This mimics applying the operation o on an arbitrary replica on which the events of Δ<sup>r</sup> have been applied in the order eor. A new event (η) corresponding to the issued operation o is computed, which is used to label the transition and is also added to the current configuration. All the events in Δ<sup>r</sup> are visible to the new event η, which is reflected in the new visibility relation vis . The system moves to the new configuration (Δ , vis , eo ) which must satisfy the consistency policy Ψ. Note that even though the general transition rule allows the event to pick any arbitrary start state σs, we restrict the start state of all events in a **well-formed execution** to be the initial CRDT state σinit, i.e. the state in which all replicas begin their execution. A trace of SP,Ψ is a sequence of transitions. Let -SP,Ψ be the set of all finite traces. Given a trace τ , L(τ ) denotes all events (i.e. labels) in τ .

**Definition 1 (Well-formed Execution).** *A trace* τ ∈ -SP,Ψ *is a well-formed execution if it begins from the empty configuration* Cinit = ({}, {}, {}) *and* ∀η ∈ L(τ )*,* σs(η) = σinit*.*

Let WF(SP,Ψ ) denote all well-formed executions of SP,Ψ . The **consistency policy** Ψ(Δ, vis, eo) is a formula constraining the events in Δ and relations vis and eo defined over these events. Below, we illustrate how to express certain well-known consistency policies in our framework:


For Eventual Consistency (EC) [3], we do not place any constraints on the visibility order and require the global effector order to be empty. This reflects the fact that in EC, any number of events can occur concurrently at different replicas, and hence a replica can witness any arbitrary subset of events which may be applied in any order. In Causal Consistency (CC) [14], an event is applied at a replica only if all causally dependent events have already been applied. An event η<sup>1</sup> is causally dependent on η<sup>2</sup> if η<sup>1</sup> was generated at a replica where either η<sup>2</sup> or any other event causally dependent on η<sup>2</sup> had already been applied. The visibility relation vis captures causal dependency, and by making vis transitive, we ensure that all causal dependencies of events in Δ<sup>r</sup> are also present in Δ<sup>r</sup> (this is because in the transition rule, Ψ is checked on the updated visibility relation which relates events in Δ<sup>r</sup> with the newly generated event). Further, causally dependent events must be applied in the same order at all replicas, which we capture by asserting that vis implies eo. In RedBlue Consistency (RB) [13], a subset of CRDT operations (O<sup>r</sup> ⊆ O) are synchronized, so that they must occur in the same order at all replicas. We express RB in our framework by requiring the visibility relation to be total among events whose operations are in Or. In Parallel Snapshot Isolation (PSI) [23], two events which conflict with each other (because they write to a common variable) are not allowed to be executed concurrently, but are synchronized across all replicas to be executed in the same order. Similar to [10], we assume that when a CRDT is used under PSI, its state space Σ is a map from variables to values, and every operation generates an effector which simply writes to certain variables. We assume that Wr(η<sup>e</sup>) returns the set of variables written by the effector η<sup>e</sup>, and express PSI in our framework by requiring that events which write a common variable are applied in the same order (determined by their visibility relation) across all replicas; furthermore, the policy requires that the visibility operation among such events is total. Finally, in Strong Consistency, the visibility relation is total and all effectors are applied in the same order at all replicas.

Given an execution τ ∈ -<sup>S</sup>P,Ψ and a transition <sup>C</sup> <sup>η</sup> −→ C in τ , we associate a set of replica states Σ<sup>η</sup> that the event can potentially witness, by considering all permutations of the effectors visible to η which obey the global effector order, when applied to the start state σs(η). Formally, this is defined as follows, assuming η = (eid, o, σs, {η1,...,ηk}, eor) and C = (Δ, vis, eo)):

$$\begin{array}{c} \Sigma\_{\eta} = \{\eta\_{P(1)}^{e} \circ \eta\_{P(2)}^{e} \circ \dots \circ \eta\_{P(k)}^{e} (\sigma\_{s}) \mid P: \{1, \dots, k\} \to \{1, \dots, k\}, \\\ \mathsf{e}\bullet\_{P} \text{ is a total order }, i < j \Rightarrow (\eta\_{P(j)}, \eta\_{P(i)}) \in \mathsf{e}\bullet\_{P}, \; \mathsf{e}\mathbf{e} \subseteq \mathsf{e}\bullet\_{P}\} \end{array}$$

In the above definition, for all valid local effector orders eo<sup>P</sup> , we compute the CRDT states obtained on applying those effectors on the start CRDT state, which constitute Ση. The original event η presumably would have witnessed one of these states.

**Definition 2 (Convergent Event).** *Given an execution* τ ∈ -SP,Ψ *and an event* η ∈ L(τ )*,* η *is convergent if* Σ<sup>η</sup> *is singleton.*

**Definition 3 (Strong Eventual Consistency).** *A CRDT* (Σ, O, σinit) *achieves strong eventual consistency* (SEC)*under a weak consistency specification* Ψ *if for all well-formed executions* τ ∈ WF(SP,Ψ ) *and for all events* η ∈ L(τ )*,* η *is convergent.*

An event is convergent if all valid permutations of visible events according to the specification Ψ lead to the same state. This corresponds to the requirement that if two replicas have witnessed the same set of operations, they must be in the same state. A CRDT achieves SEC if all events in all executions are convergent.

#### **4 Automated Verification**

In order to show that a CRDT achieves SEC under a consistency specification, we need to show that all events in any execution are convergent, which in turn requires us to show that any valid permutation of valid subsets of events in an execution leads to the same state. This is a hard problem because we have to reason about executions of unbounded length, involving unbounded sets of effectors and reconcile the declarative event-based specifications of weak consistency with states generated during execution. To make the problem tractable, we use a two-fold strategy. First, we show that if any pair of effectors generated during any execution either commute with each other or are forced to be applied in the same order by the consistency policy, then the CRDT achieves SEC. Second, we develop an inductive proof rule to show that *all* pairs of effectors generated during any (potentially unbounded) execution obey the above mentioned property. To ensure soundness of the proof rule, we place some reasonable assumptions on the consistency policy that (intuitively) requires behaviorally equivalent events to be treated the same by the policy, regardless of context (i.e., the length of the execution history at the time the event is applied). We then extract a simple sufficient condition which we call as *non-interference to commutativity* that captures the heart of the inductive argument. Notably, this condition can be automatically checked for different CRDTs under different consistency policies using off-the-shelf theorem provers, thus providing a pathway to performing automated parametrized verification of CRDTs.

Given a transition (Δ, vis, eo) <sup>η</sup> −→ C, we denote the global effector order in the starting configuration of η, i.e. eo as eoη. We first show that a sufficient condition to prove that a CRDT is convergent is to show that any two events in its history either commute or are related by the global effector order.

**Lemma 1.** *Given an execution* τ ∈ -SP,Ψ *, and an event* η = (id, o, σs, <sup>Δ</sup>r, eor) <sup>∈</sup> <sup>L</sup>(<sup>τ</sup> )*, if for all* <sup>η</sup>1, η<sup>2</sup> <sup>∈</sup> <sup>Δ</sup><sup>r</sup> *such that* <sup>η</sup><sup>1</sup> <sup>=</sup> <sup>η</sup>2*, either* <sup>η</sup><sup>e</sup> <sup>1</sup> ◦ <sup>η</sup><sup>e</sup> <sup>2</sup> = η<sup>e</sup> <sup>2</sup> ◦ <sup>η</sup><sup>e</sup> 1 *or* eoη(η1, η2) *or* eoη(η2, η1)*, then* η *is convergent*<sup>3</sup>*.*

<sup>3</sup> All proofs can be found in the extended version [15] of the paper.

We now present a property that consistency policies must obey for our verification methodology to be soundly applied. First, we define the notion of behavioral equivalence of events:

#### **Definition 4 (Behavioral Equivalence).**

*Two events* η<sup>1</sup> = (id1, o1, σ1, Δ1, eo1) *and* η<sup>2</sup> = (id2, o2, σ2, Δ2, eo2) *are behaviorally equivalent if* η<sup>e</sup> <sup>1</sup> = η<sup>e</sup> <sup>2</sup> *and* o<sup>1</sup> = o2*.*

That is, behaviorally equivalent events produce the same effectors. We use the notation η<sup>1</sup> ≡ η<sup>2</sup> to indicate that they are behaviorally equivalent.

**Definition 5 (Behaviorally Stable Consistency Policy).** *A consistency policy* Ψ *is behaviorally stable if* ∀Δ, vis, eo, Δ , vis- , eo- *,* η1, η<sup>2</sup> ∈ Δ*,* η - 1, η- <sup>2</sup> <sup>∈</sup> <sup>Δ</sup>- *the following holds:*

$$\begin{aligned} \left(\Psi(\Delta, \text{vis}, \mathbf{e}\mathbf{o}) \wedge \Psi(\Delta', \text{vis}', \mathbf{e}\mathbf{o}') \wedge \eta\_1 \equiv \eta\_1' \wedge \eta\_2 \equiv \eta\_2' \wedge \text{vis}(\eta\_1, \eta\_2) \Leftrightarrow \text{vis}'(\eta\_1', \eta\_2')\right) \\ &\Rightarrow \mathbf{e}\mathbf{o}(\eta\_1, \eta\_2) \Leftrightarrow \mathbf{e}\mathbf{o}'(\eta\_1', \eta\_2') \end{aligned} \tag{2}$$

Behaviorally stable consistency policies treat behaviorally equivalent events which have the same visibility relation among them in the same manner by enforcing the same effector order. All consistency policies that we discussed in the previous section (representing the most well-known in the literature) are behaviorally stable:

#### **Lemma 2.** EC*,* CC*,* PSI*,* RB *and* SC *are behaviorally stable.*

EC does not enforce any effector ordering and hence is trivially stable behaviorally. CC forces causally dependent events to be in the same order, and hence behaviorally equivalent events which have the same visibility order will be forced to be in the same effector order. RB forces events whose operations belong to a specific subset to be in the same order, but since behaviorally equivalent events perform the same operation, they would be enforced in the same effector ordering. Similarly, PSI forces events writing to a common variable to be in the same order, but since behaviorally equivalent events generate the same effector, they would also write to the same variables and hence would be forced in the same effector order. SC forces all events to be in the same order which is equal to the visibility order, and hence is trivially stable behaviorally. In general, behaviorally stable consistency policies do not consider the context in which events occur, but instead rely only on observable behavior of the events to constrain their ordering. A simple example of a consistency policy which is not behaviorally stable is a policy which maintains bounded concurrency [12] by limiting the number of concurrent operations across all replicas to a fixed bound. Such a policy would synchronize two events only if they occur in a context where keeping them concurrent would violate the bound, but behaviorally equivalent events in a different context may not be synchronized.

For executions under a behaviorally stable consistency policy, the global effector order between events only grows in an execution, so that if two events η<sup>1</sup> and η<sup>2</sup> are in the history of some event η are related by eoη, then if they later occur in the history of any other event, they would be related in the same effector order. Hence, we can now define a common global effector order for an execution. Given an execution τ ∈ -SP,Ψ , the effector order eo<sup>τ</sup> ⊆ L(τ ) × L(τ ) is an anti-symmetric relation defined as follows:

$$\mathbf{e}\mathbf{e}\_{\tau} = \{ (\eta\_1, \eta\_2) \mid \exists \eta \in L(\tau) . (\eta\_1, \eta\_2) \in \mathbf{e}\mathbf{o}\_{\eta} \}$$

Similarly, we also define vis<sup>τ</sup> to be the common visibility relation for an execution τ , which is nothing but the vis relation in the final configuration of τ .

**Definition 6 (Commutative modulo Consistency Policy).** *Given a CRDT* P*, a behaviorally stable weak consistency specification* Ψ *and an execution* τ ∈ -SP,Ψ *, two events* η1, η<sup>2</sup> ∈ L(τ ) *such that* η<sup>1</sup> = η<sup>2</sup> *commute modulo the consistency policy* Ψ *if either* η<sup>e</sup> <sup>1</sup> ◦ <sup>η</sup><sup>e</sup> <sup>2</sup> = η<sup>e</sup> <sup>2</sup> ◦ <sup>η</sup><sup>e</sup> <sup>1</sup> *or* eo<sup>τ</sup> (η1, η2) *or* eo<sup>τ</sup> (η2, η1)*.*

The following lemma is a direct consequence of Lemma 1:

**Lemma 3.** *Given a CRDT* P *and a behaviorally stable consistency specification* Ψ*, if for all* τ ∈ WF(SP,Ψ )*, for all* η1, η<sup>2</sup> ∈ L(τ ) *such that* η<sup>1</sup> = η2*,* η<sup>1</sup> *and* η<sup>2</sup> *commute modulo the consistency policy* Ψ*, then* P *achieves SEC under* Ψ*.*

Our goal is to use Lemma 3 to show that all events in any execution commute modulo the consistency policy. However, executions can be arbitrarily long and have an unbounded number of events. Hence, for events occurring in such large executions, we will instead consider behaviorally equivalent events in a smaller execution and show that they commute modulo the consistency policy, which by stability of the consistency policy directly translates to their commutativity in the larger context. Recall that the effector generated by an operation depends on its start state and the sequence of other effectors applied to that state. To generate behaviorally equivalent events with arbitrarily long histories in short executions, we summarize these long histories into the start state of events, and use commutativity itself as an inductive property of these start states. That is, we ask if two events with arbitrary start states and empty histories commute modulo Ψ, whether the addition of another event to their histories would continue to allow them to commute modulo Ψ.

**Definition 7 (Non-interference to Commutativity).** (Non-Interf) *A CRDT* P = (Σ, O, σ*init*) *satisfies non-interference to commutativity under a consistency policy* Ψ *if and only if the following conditions hold:*


Condition (1) corresponds to the base case of our inductive argument and requires that in well-formed executions with 2 events, both the events commute modulo Ψ. For condition (2), our intention is to consider two events η<sup>a</sup> and η<sup>b</sup> with any arbitrary histories which can occur in any well-formed execution and, assuming that they commute modulo Ψ, show that even after the addition of another event to their histories, they continue to commute. We use CRDT states σ1, σ<sup>2</sup> to summarize the histories of the two events, and construct behaviorally equivalent events (η<sup>1</sup> ≡ η<sup>a</sup> and η<sup>2</sup> ≡ ηb) which would take σ1, σ<sup>2</sup> as their start states. That is, if η<sup>a</sup> produced the effector o(σinit, π)<sup>4</sup>, where o is the CRDT operation corresponding to η<sup>a</sup> and π is the sequence of effectors in its history, we leverage the observation that o(σinit, π) = o(π(σinit), ), and assuming σ<sup>1</sup> = π(σinit), we obtain the behaviorally equivalent event η1, i.e. η<sup>e</sup> <sup>1</sup> <sup>≡</sup> <sup>η</sup><sup>e</sup> a. Similar analysis establishes that η<sup>e</sup> <sup>2</sup> <sup>≡</sup> <sup>η</sup><sup>e</sup> <sup>b</sup> . However, since we have no way of characterizing states σ<sup>1</sup> and σ<sup>2</sup> which are obtained by applying arbitrary sequences of effectors, we use commutativity itself as an identifying characteristic, focusing on only those σ<sup>1</sup> and σ<sup>2</sup> for which the events η<sup>1</sup> and η<sup>2</sup> commute modulo Ψ.

The interfering event is also summarized by another CRDT state σ3, and we require that after suffering interference from this new event, the original two events would continue to commute modulo Ψ. This would essentially establish that any two events with any history would commute modulo Ψ in these small executions, which by the behavioral stability of Ψ would translate to their commutativity in any execution.

**Theorem 1.** *Given a CRDT* P *and a behaviorally stable consistency policy* Ψ*, if* P *satisfies non-interference to commutativity under* Ψ*, then* P *achieves* SEC *under* Ψ*.*

**Example:** Let us apply the proposed verification strategy to the ORSet CRDT shown in Fig. 2. Under EC, condition (1) of Non-Interf fails, because in the execution Cinit η<sup>1</sup> −→ C<sup>1</sup> η<sup>2</sup> −→ C<sup>2</sup> where o(η1) =Add(a,i) and o(η2) =Remove(a) and vis(η1, η2), η<sup>1</sup> and η<sup>2</sup> don't commute modulo EC, since (a,i) would be present in the source replica of Remove(a). However, η<sup>1</sup> and η<sup>2</sup> would commute modulo CC, since they would be related by the effector order. Now, moving to condition (2) of Non-interf, we limit ourselves to source replica states σ<sup>1</sup> and σ<sup>2</sup> where Add(a,i) and Remove(a) do commute modulo CC. If vis<sup>τ</sup> (η1, η2), then after interference, in execution τ , vis<sup>τ</sup>- (η - 1, η- <sup>2</sup>), in which case η - <sup>1</sup> and η - <sup>2</sup> trivially commute modulo CC (because they would be related by the effector order). On the other hand, if ¬vis<sup>τ</sup> (η1, η2), then for η<sup>1</sup> and η<sup>2</sup> to commute modulo CC, we must have that the effectors η<sup>e</sup> <sup>1</sup> and η<sup>e</sup> <sup>2</sup> themselves commute, which implies that (a,i) ∈/ σ2. Now, consider any execution τ - with an interfering operation η3. If η<sup>3</sup> is another Add(a,i') operation, then i' = i, so that even if it is visible to η - <sup>2</sup>, η - e <sup>2</sup> will not remove (a,i), so that η - <sup>1</sup> and η - <sup>2</sup> would commute. Similarly, if η<sup>3</sup> is another Remove(a) operation, it can only remove tagged versions of a from the source replicas of η - <sup>2</sup>, so that the effector η - e <sup>2</sup> would not remove (a,i).

<sup>4</sup> Note that in a well-formed execution, the start state is always σinit.

#### **5 Experimental Results**

In this section, we present the results of applying our verification methodology to a number of CRDTs under different consistency models. We collected CRDT implementations from a number of sources [1,19,20] and since all of the existing implementations assume a very weak consistency model (primarily CC), we additionally implemented a few CRDTs on our own intended to only work under stronger consistency schemes but which are better in terms of time/space complexity and ease of development. Our implementations are not written in any specific language but instead are specified abstractly akin to the definitions given in Figs. 1 and 2. To specify CRDT states and operations, we fix an abstract language that contains uninterpreted datatypes (used for specifying elements of sets, lists, etc.), a set datatype with support for various set operations (add, delete, union, intersection, projection, lookup), a tuple datatype (along with operations to create tuples and project components) and a special uninterpreted datatype equipped with a total order for identifiers. Note that the set datatype used in our abstract language is different from the Set CRDT, as it is only intended to perform set operations locally at a replica. All existing CRDT definitions can be naturally expressed in this framework.

Here, we revert back to the op-based specification of CRDTs. For a given CRDT P = (Σ, O, σinit), we convert all its operations into FOL formulas relating the source, input and output replica states. That is, for a CRDT operation <sup>o</sup> : <sup>Σ</sup> <sup>→</sup> <sup>Σ</sup> <sup>→</sup> <sup>Σ</sup>, we create a predicate <sup>o</sup> : <sup>Σ</sup> <sup>×</sup><sup>Σ</sup> <sup>×</sup><sup>Σ</sup> <sup>→</sup> <sup>B</sup> such that <sup>o</sup>(σs, σi, σo) is true if and only if o(σs)(σi) = σo. Since CRDT states are typically expressed as sets, we axiomatize set operations to express their semantics in FOL.

In order to specify a consistency model, we introduce a sort for events and binary predicates vis and eo over this sort. Here, we can take advantage of the declarative specification of consistency models and directly encode them in FOL. Given an encoding of CRDT operations and a consistency model, our verification strategy is to determine whether the Non-Interf property holds. Since both conditions of this property only involve executions of finite length (at most 3), we can directly encode them as UNSAT queries by asking for executions which break the conditions. For condition (1), we query for the existence of two events η<sup>1</sup> and η<sup>2</sup> along with vis and eo predicates which satisfy the consistency specification Ψ such that these events are not related by eo and their effectors do not commute. For condition (2), we query for the existence of events η1, η2, η<sup>3</sup> and their respective start states σ1, σ2, σ3, such that η<sup>1</sup> and η<sup>2</sup> commute modulo Ψ but after interference from η3, they are not related by eo and do not commute. Both these queries are encoded in EPR [18], a decidable fragment of FOL, so if the CRDT operations and the consistency policy can also be encoded in a decidable fragment of FOL (which is the case in all our experiments), then our verification strategy is also decidable. We write Non-Interf-1 and Non-Interf-2 for the two conditions of Non-Interf.

Figure 4 shows the results of applying the proposed methodology on different CRDTs. We used Z3 to discharge our satisfiability queries. For every combination of a CRDT and a consistency policy, we write ✗ to indicate that verification of


**Fig. 4.** Convergence of CRDTs under different consistency policies.

Non-Interf failed, while ✓ indicates that it was satisfied. We also report the verification time taken by Z3 for every CRDT across all consistency policies executing on a standard desktop machine. We have picked the three collection datatypes for which CRDTs have been proposed i.e. Set, List and Graph, and for each such datatype, we consider multiple variants that provide a tradeoff between consistency requirements and implementation complexity. Apart from EC, CC and PSI, we also use a combination of PSI and RB, which only enforce PSI between selected pairs of operations (in contrast to simple RB which would enforce SC between all selected pairs). Note that when verifying a CRDT under PSI, we assume that the set operations are implemented as Boolean assignments, and the write set Wr consists of elements added/removed. We are unaware of any prior effort that has been successful in automatically verifying *any* CRDT, let alone those that exhibit the complexity of the ones considered here.

**Set:** The Simple-Set CRDT in Fig. 1 does not converge under EC or CC, but achieves convergence under PSI+RB which only synchronizes Add and Remove operations to the same elements, while all other operations continue to run under EC, since they do commute with each other. As explained earlier, ORSet does not converge under EC and violates Non-Interf-1. ORSet with tombstones converges under EC as well since it uses a different set (called a tombstone) to keep track of removed elements. USet is another implementation of the Set CRDT which converges under the assumptions that an element is only added once, and removes only work if the element is already present in the source replica. USet converges only under PSI, because under any weaker consistency model, non-interf-2 breaks, since Add(a) interferes and breaks the commutativity of Add(a) and Remove(a). Notice that as the consistency level weakens, implementations need to keep more and more information to maintain convergence–compute unique ids, tag elements with them or keep track of deleted elements. If the underlying replicated store supports stronger consistency levels such as PSI, simpler definitions are sufficient.

**List:** The List CRDT maintains a total ordering between its elements. It supports two operations: AddRight(e,a) adds new element a to the right of existing element e, while Remove(e) removes e from the list. We use the implementation in [1] (called RGA) which uses time-stamped insertion trees. To maintain integrity of the tree structure, the immediate predecessor of every list element must be present in the list, due to which operations AddRight(a,b) and AddRight(b,c) do not commute. Hence RGA does not converge under EC because Non-Interf-1 is violated, but converges under CC.

To make adds and removes involving the same list element commute, RGA maintains a tombstone set for all deleted list elements. This can be expensive as deleted elements may potentially need to be tracked forever, even with garbage collection. We consider a slight modification of RGA called RGA-No-Tomb which does not keep track of deleted elements. This CRDT now has a convergence violation under CC (because of Non-Interf-1), but achieves convergence under PSI+RB where we enforce PSI only for pairs of AddRight and Remove operations.

**Graph:** The Graph CRDT maintains sets of vertices and edges and supports operations to add and remove vertices and edges. The 2P2P-Graph specification uses separate 2P-Sets for both vertices and edges, where a 2P-Set itself maintains two sets for addition and removal of elements. While 2P sets themselves converge under EC, the 2P2P-Graph has convergence violations (to Non-Interf-1) involving AddVertex(v) and RemoveVertex(v) (similarly for edges) since it removes a vertex from a replica only if it is already present. We verify that it converges under CC. Graphs require an integrity constraint that edges in the edge-set must always be incident on vertices in the vertex-set. Since concurrent RemoveVertex(v) and AddEdge(v,v') can violate this constraint, the 2P2P-Graph uses the internal structure of the 2P-Set which keeps track of deleted elements and considers an edge to be in the edge set only if its vertices are not in the vertex tombstone set (leading to a remove-wins strategy).

Building a graph CRDT can be viewed as an exercise in composing CRDTs by using two ORSet CRDTs, keeping the internal implementation of the ORSet opaque, using only its interface. The Graph-with-ORSet implementation uses separate ORSets for vertices and edges and explicitly maintains the graph integrity constraint. We find convergence violations (to Non-Interf-1) between RemoveVertex(v) and AddEdge(v,v'), and RemoveVertex(v) and RemoveEdge(v,v') under both EC and CC. Under PSI+RB (enforcing RB on the above two pairs of operations), we were able to show convergence.

When a CRDT passes Non-Interf under a consistency policy, we can guarantee that it achieves SEC under that policy. However, if it fails Non-Interf, it may or may not converge. In particular, if it fails Non-Interf-1 it will definitely not converge (because Non-Interf-1 constructs a well-formed execution), but if it passes Non-Interf-1 and fails Non-Interf-2, it may still converge because of the imprecision of Non-Interf-2. There are two sources of imprecision, both concerning the start states of the events picked in the condition: (1) we only use commutativity as a distinguishing property of the start states, but this may not be a sufficiently strong inductive invariant, (2) we place no constraints on the start state of the interfering operation. In practice, we have found that for all cases except U-Set, convergence violations manifest via failure of Non-Interf-1. If Non-Interf-2 breaks, we can search for well-formed executions of higher length upto a bound. For U-Set, we were successful in adopting this approach, and were able to find a non-convergent well-formed execution of length 3.

#### **6 Related Work and Conclusions**

Reconciling concurrent updates in a replicated system is a important well-studied problem in distributed applications, having been first studied in the context of collaborative editing systems [17]. Incorrect implementation of replicated sets in Amazon's Dynamo system [7] motivated the design of CRDTs as a principled approach to implementing replicated data types. Devising correct implementations has proven to be challenging, however, as evidenced by the myriad pre-conditions specified in the various CRDT implementations [20].

Burckhardt *et al.* [6] present an abstract event-based framework to describe executions of CRDTs under different network conditions; they also propose a rigorous correctness criterion in the form of abstract specifications. Their proof strategy, which is neither automated nor parametric on consistency policies, verifies CRDT implementations against these specifications by providing a simulation invariant between CRDT states and event structures. Zeller *et al.* [24] also require simulation invariants to verify convergence, although they only target state-based CRDTs. Gomes *et al.* [9] provide mechanized proofs of convergence for ORSet and RGA CRDTs under causal consistency, but their approach is neither automated nor parametric.

A number of earlier efforts [2,10–12,22] have looked at the problem of verifying state-based invariants in distributed applications. These techniques typically target applications built using CRDTs, and assume their underlying correctness. Because they target correctness specifications in the form of state-based invariants, it is unclear if their approaches can be applied directly to the convergence problem we consider here. Other approaches [4,5,16] have also looked at the verification problem of transactional programs running on replicated systems under weak consistency, but these proposals typically use serializability as the correctness criterion, adopting a "last-writer wins" semantics, rather than convergence, to deal with concurrent updates.

This paper demonstrates the automated verification of CRDTs under different weak consistency policies. We rigorously define the relationship between commutativity and convergence, formulating the notion of commutativity modulo consistency policy as a sufficient condition for convergence. While we require a non-trivial inductive argument to show that non-interference to commutativity is sufficient for convergence, the condition itself is designed to be simple and amenable to automated verification using off-the-shelf theorem-provers. We have successfully applied the proposed verification strategy for all major CRDTs, additionally motivating the need for parameterization in consistency policies by showing variants of existing CRDTs which are simpler in terms of implementation complexity but converge under different weak consistency models.

**Acknowledgments.** We thank the anonymous reviewers for their insightful comments. This material is based upon work supported by the National Science Foundation under Grant No. CCF-SHF 1717741 and the Air Force Research Lab under Grant No. FA8750-17-1-0006.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **What's Wrong with On-the-Fly Partial Order Reduction**

Stephen F. Siegel(B)

University of Delaware, Newark, DE, USA siegel@udel.edu

**Abstract.** Partial order reduction and on-the-fly model checking are well-known approaches for improving model checking performance. The two optimizations interact in subtle ways, so care must be taken when using them in combination. A standard algorithm combining the two optimizations, published over twenty years ago, has been widely studied and deployed in popular model checking tools. Yet the algorithm is incorrect. Counterexamples were discovered using the Alloy analyzer. A fix for a restricted class of property automata is proposed.

**Keywords:** Model checking · Partial order reduction · On-the-fly · Spin

#### **1 Introduction**

*Partial order reduction* (POR) refers to a family of model checking techniques used to reduce the size of the state space that must be explored when verifying a property of a program. The techniques vary, but all share the core observation that when two independent operations are enabled in a state, it is often safe to ignore traces that begin with one of them. A large number of POR techniques have been explored, differing in details such as the range of properties to which they apply. This paper focuses on *ample set* POR [4], an approach which applies to stutter-invariant properties and is used in the model checker Spin [8].

In the automata-theoretic view of model checking, the negation of the property to be verified is represented by an ω-automaton. The basic algorithm computes the product of this automaton with the state space of the program. The language of the product is empty if and only if the program cannot violate the property. *On-the-fly* model checking refers to an optimization of this basic algorithm in which the enumeration of the reachable program states, computation of the product, and language emptiness check are interleaved, rather than occurring in sequence.

These two optimizations must be combined with care, because they interact in subtle ways.<sup>1</sup> A standard algorithm for on-the-fly ample set POR is described

c The Author(s) 2019 I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 478–495, 2019. https://doi.org/10.1007/978-3-030-25543-5\_27

<sup>1</sup> Previous work, for example, has dealt with problems, distinct from those discussed in this paper, that arise when combining nested depth first search and POR [7,14].

in [12] and in further detail in [13]. I shall refer to this algorithm as the *combined algorithm*. Theorem 4.2 of [13] asserts the soundness of the combined algorithm. A proof of the theorem is also given in [13].

The proof has a gap. This was pointed out in [16, Sect. 5], with details in [15]. The gap was rediscovered in the course of developing mechanized correctness proofs for model checking algorithms; an explicit counterexample to the incorrect proof step was also found ([2, Sect. 8.4.5] and [3, Sect. 5]). The fact that the proof is erroneous, however, does not imply the theorem is wrong. To the best of my knowledge, no one has yet produced a proof or a counterexample for the soundness of the combined algorithm.

In this paper, I show that the combined algorithm is not sound; a counterexample is given in Sect. 3.1. I found this counterexample by modeling the combined algorithm in Alloy and using the Alloy analyzer [11] to check its soundness. Sect. 4 describes this model. Spin's POR is based on the combined algorithm, and in Sect. 5, Spin is seen to return an incorrect result on a Promela model derived from the theoretical counterexample.

There is a small adjustment to the combined algorithm, yielding an algorithm that is arguably more natural and that returns the correct result on the previous counterexample; this is described in Sect. 6. It turns out this one is also unsound, as demonstrated by another Alloy-produced counterexample. However, in Sect. 7, I show that this variation is sound if certain restrictions are placed on the property automaton.

#### **2 Preliminaries**

**Definition 1.** *A* finite state program *is a triple* P = -T, Q, ι*, where* Q *is a finite set of* states*,* ι ∈ Q *is the* initial state*, and* T *is a finite set of* operations*. Each operation* α ∈ T *is a function from a set* en<sup>α</sup> ⊆ Q *to* Q*.*

Fix a finite state program P = -T, Q, ι.

**Definition 2.** *For* q ∈ Q*, define* en(q) = {α ∈ T | q ∈ enα}*.*

**Definition 3.** *An* execution *of* P *is an infinite sequence of operations* α1α<sup>2</sup> ··· *that* generates *the sequence of states* ξ = q0q1q<sup>2</sup> ··· *such that* q<sup>0</sup> = ι *and for* i ≥ 0*,* q<sup>i</sup> ∈ en<sup>α</sup>*i*+1 *and* qi+1 = αi+1(qi)*. An* admissible *sequence is any segment of an execution.*

**Definition 4.** *A* B¨uchi automaton *is a tuple* B = -S, Δ, Σ, δ, F*, where* S *is a finite set of* automaton states*,* Δ ⊆ S *is the set of* initial states*,* Σ *is a finite set called the* alphabet*,* δ ⊆ S × Σ × S *is the* transition relation*, and* F ⊆ S *is the set of* accepting states*. The* language *of* B*, denoted* L(B)*, is the set of all* <sup>ξ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> *generated by infinite paths in* <sup>B</sup> *that pass through an accepting state infinitely often.*

Fix a finite set AP of *atomic propositions* and let Σ = 2AP. Fix an *interpretation mapping* for P, i.e., a function L: Q → Σ. **Definition 5.** *The* language of P*, denoted* L(P)*, is the set of all infinite words* <sup>L</sup>(q0)L(q1)··· ∈ <sup>Σ</sup>ω*, where* <sup>q</sup>0q<sup>1</sup> ··· *is the sequence of states generated by an execution of* P*.*

**Definition 6.** *A language* <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>ω</sup> *is* stutter-invariant *if, for any* <sup>a</sup>0, a1,... <sup>∈</sup> <sup>Σ</sup> *and positive integers* <sup>i</sup>0, i<sup>1</sup> ...*,* <sup>a</sup>0a<sup>1</sup> ··· ∈ <sup>L</sup> <sup>⇔</sup> <sup>a</sup>i<sup>0</sup> <sup>0</sup> ai<sup>1</sup> <sup>1</sup> ··· ∈ <sup>L</sup>*, where* <sup>a</sup><sup>i</sup> *denotes the concatenation of* i *copies of* a*.*

**Definition 7.** *Let* B = -S, Δ, Σ, δ, F*, be a B¨uchi automaton with alphabet* Σ*. The* product of P *and* B *is the B¨uchi automaton*

$$P \otimes \mathcal{B} = \langle Q \times S, \{\iota\} \times \Delta, T \times \Sigma, \delta\_{\otimes}, Q \times F \rangle,$$

*where*

$$\delta\_{\otimes} = \{ (\langle q, s \rangle, \langle \alpha, \sigma \rangle, \langle q', s' \rangle) \mid \sigma = L(q) \land \langle s, \sigma, s' \rangle \in \delta \land q' = \alpha(q) \}.$$

*Note 1.* A transition from product state x = q, s can be viewed as taking place in two steps. First, a transition s L(q) −−−→ s in B executes, leading to an "intermediate state" x = q, s . Then a program transition <sup>q</sup> <sup>α</sup> −→ q executes, culminating in y = q , s . While this is a good mental model, the product automaton does not necessarily contain a transition from x to x or from x to y. The intermediate state x is not even necessarily reachable in the product. The transition in the product goes directly from x to y with label α, L(q).

It is well-known that

$$
\mathcal{L}(P) \cap \mathcal{L}(\mathcal{B}) = \emptyset \Leftrightarrow \mathcal{L}(P \otimes \mathcal{B}) = \emptyset .
$$

In the context of model checking, B is used to represent the negation of a desirable property; the program P satisfies the property if, and only if, no execution of P is accepted by B, i.e., L(P) ∩ L(B) = ∅. The automaton B may be generated from a (negated) LTL formula, but that assumption is not needed here.

The goal of "offline" (not on-the-fly) partial order reduction is to generate some subspace P of P with the guarantee that

$$
\mathcal{L}(P') \cap \mathcal{L}(\mathcal{B}) = \emptyset \Leftrightarrow \mathcal{L}(P) \cap \mathcal{L}(\mathcal{B}) = \emptyset
$$

The emptiness of L(P ⊗ B) = L(P ) ∩ L(B) can be decided in various ways, such as a nested depth first search (NDFS) [5].

#### **3 On-the-Fly Partial Order Reduction**

In on-the-fly model checking, the state space of the product automaton is enumerated directly, without first enumerating the program states. Adding POR to the mix means that at each state reached in the product automaton, some subset of enabled transitions will be explored. The goal is to ensure that if the language of the full product automaton is nonempty, then the language of the resulting reduced automaton must be nonempty.

To make this precise, fix a finite state program P = -T, Q, ι, a set AP of atomic propositions, an interpretation <sup>L</sup>: <sup>Q</sup> <sup>→</sup> <sup>Σ</sup> = 2AP, and B¨uchi automaton B = -S, Δ, Σ, δ, F. Let A = P ⊗ B.

**Definition 8.** *A function* amp: <sup>Q</sup>×<sup>S</sup> <sup>→</sup> <sup>2</sup><sup>T</sup> *is an* ample selector *if* amp(q, s) <sup>⊆</sup> en(q) *for all* q ∈ Q, s ∈ S*. Each* amp(q, s) *is an* ample set*.*

An ample selector determines a subautomaton A = reduced(A, amp) of A: A is defined exactly as in Definition 7, except that the transition relation has the additional restriction that α ∈ amp(q, s ):

$$\mathcal{A}' = \langle Q \times S, \{\iota\} \times \Delta, T \times \Sigma, \delta', Q \times F \rangle \tag{1}$$

$$\begin{aligned} \delta' = \left\{ (\langle q, s \rangle, \langle \alpha, \sigma \rangle, \langle q', s' \rangle) \in (Q \times S) \times (T \times \Sigma) \times (Q \times S) \; | \; | \; \\ \sigma = L(q) \wedge \langle s, \sigma, s' \rangle \in \delta \land \alpha \in \mathsf{amp}(q, s') \wedge q' = \alpha(q) \right\}. \end{aligned} \tag{2}$$

**Definition 9.** *An ample selector* amp *is* POR-sound *if the following holds:*

$$
\mathcal{L}(\mathsf{reduced}(\mathcal{A}, \mathsf{amp})) = \emptyset \Leftrightarrow \mathcal{L}(P) \cap \mathcal{L}(\mathcal{B}) = \emptyset .
$$

The goal is to define some constraints on an ample selector that guarantee it is POR-sound. Before stating the constraints, we need two more concepts:

**Definition 10.** *An* independence relation *is an irreflexive and symmetric relation* I ⊆ T × T *satisfying the following: if* (α, β) ∈ I *and* q ∈ en<sup>α</sup> ∩ enβ*, then* α(q) ∈ enβ*,* β(q) ∈ enα*, and* α(β(q)) = β(α(q))*.*

Fix an independence relation I. We say α and β are *dependent* if (α, β) ∈ I.

**Definition 11.** *An operation* α ∈ T *is* invisible with respect to L *if, for all* q ∈ enα*,* L(q) = L(α(q))*.*

*Note 2.* The definition in [13] is slightly different. Given an LTL formula φ over AP, let AP be the set of atomic propositions occurring syntactically in φ. The definition in [13] says α is *invisible in* φ if, for all p ∈ AP and q ∈ enα, p ∈ L(q) ⇔ p ∈ L(α(q)). However, there is no loss of generality using Definition 11, since one can define a new interpretation <sup>L</sup> : <sup>Q</sup> <sup>→</sup> <sup>2</sup>AP- by L (q) = L(q) ∩ AP . Then α is invisible for φ if, and only if, α is invisible with respect to L , and the results of this paper can be applied without modification to P, AP , and L .

We now define the following constraints on an ample selector amp: 2

**C0** For all q ∈ Q, s ∈ S: en(q) = ∅ =⇒ amp(q, s) = ∅.

<sup>2</sup> I am using the numbering from [4]. In [13], **C2** and **C3** are swapped.


Condition **C3** is the interesting one. The combined algorithm of [13] enforces it using a DFS (the outer search of the NDFS) of the reduced space and the following protocol: given a new state q, s that has just been pushed onto the stack, first iterate over all B¨uchi transitions s, L(q), s departing from s and labeled by L(q). For each of these, a candidate ample set for amp(q, s ) that satisfies the first three conditions is computed; this computation does not depend on s . If any operation in that candidate set leads back to a state on the search stack (a "back edge"), a different candidate is tried and the process is repeated until a satisfactory one is found. If no such candidate is found, en(q) is used for the ample set.

Hence the process for choosing the ample set depends on the current state of the search. If y<sup>1</sup> = y2, it is not necessarily the case that amp(x, y1) = amp(x, y2), because it is possible that when x, y1 was encountered, a back edge existed for a candidate, but when x, y2 was encountered, there was no back edge.

#### **3.1 Counterexample**

Theorem 4.2 of [13] can be expressed as follows: if L(B) is stutter-invariant and the language of an LTL formula, and amp satisfies **C0–C3**, then amp is PORsound.

**Fig. 1.** Counterexample to combined theorem. Left: program and interpretation. Center: property automaton *B*<sup>1</sup> and ample selector function. Right: the reachable product state space; dashed edges are in the full, but not reduced, space.

A counterexample to this claim is given in Fig. 1. The program consists of two states, A and B, and two operations, α and β. There is a single atomic proposition, p, which is *false* at A and *true* at B. Note that α and β are independent. Also, α is invisible, and β is not.

The property automaton, B1, is shown in Fig. 1 (center top). It has two states, numbered 0 and 1. State 1 is the sole accepting state. The language consists of all infinite words of the following form: a finite nonempty prefix of ∅s followed by an infinite sequence of {p}s. This language is stutter-invariant, and is the language of the LTL formula (¬p) ∧ ((¬p)**U G**p).

The ample selector is specified by the table (center bottom). Notice that amp(A, 1) = en(A), but the other three ample sets are full. **C0** holds because the ample sets are never empty. **C1** holds because β is independent of α. **C2** holds because α is invisible. The reachable product space is shown in Fig. 1 (right). In any DFS of reduced(A, amp), the only back edge is the self-loop on A0 labeled α, <sup>∅</sup>. Since amp(A, 0) is full, **C3** holds. Yet there is an accepting path in the full space, but not in the reduced space.

#### **4 Alloy Models of POR Schemes**

Alloy is a "lightweight formal methods" language and tool. It has been used in a wide variety of contexts, from exploring software designs to studying weak memory-consistency models. An Alloy model specifies *signatures*, each of which defines a type, relations on signatures, and constraints on the signatures and relations. Constraints are expressed in a logic that combines elements of first order logic and relational logic, and includes a transitive closure operator. An *instance* of a model assigns a finite set of *atoms* to each signature, and a finite set of tuples (of the right type) to each relation, in such a way that the constraints are satisfied. The Alloy analyzer can be used to check that an assertion holds on all instances in which the sizes of the signatures are within some specified bounds. The analyzer converts the question of the validity of the assertion into a SAT problem and invokes a SAT solver. Based on the result, it reports either that the assertion holds within the given bounds, or it produces an instance of the model violating the assertion.

I developed an Alloy model to search for counterexamples to various POR claims, such as the one in Sect. 3.1. The model encodes the main concepts of the previous two sections, including program, operations, interpretation, invisibility and independence, property automaton, the product space, ample selectors and the constraints on them, and a language emptiness predicate. The model culminates in an assertion which states that an ample selector satisfying the four constraints is POR-sound.

I was not able to find a way to encode stutter-invariance. In the end, I developed a small set of B¨uchi automata based on my own intuition of what would make interesting tests. I encoded these in Alloy and used the analyzer to explore all possible programs and ample selectors for each.

The first part of the model is a simple encoding of a finite state automaton. The following is a listing of file ba.als:

<sup>1</sup> module ba -- *module for simple model of B¨uchi automata*

```
2 sig Sigma {} -- alphabet of BA, valuation on atomic props
```
<sup>3</sup> sig BState {} -- *a state in the B¨uchi Automaton*

```
4 one sig Binit extends BState {} -- initial state of BA
5 sig AState in BState {} -- accepting states of BA
6 -- a transition has a source state, label, and destination state...
7 sig BTrans { src: one BState, label: one Sigma, dest: one BState }
```
The alphabet is some unconstrained set Sigma. The set of states is represented by signature BState. There is a single initial state, and any number of accepting states. Each transition has a source and destination state, and label. Relations declared within a signature declaration have that signature as an implicit first argument. So, for example, src is a binary relation of type BTrans × BState. Furthermore, the relation is many-to-one: each transition has exactly one BState atom associated to it by the src relation.

The remaining concepts are incorporated into module por\_v0:

```
1 module por_v0 -- on-the-fly POR variant 0, corresponding to [13]
2 open ba -- import the B¨uchi automata module
3 sig Operation {} -- program operation
4 sig PState { -- program state
5 label: one Sigma, -- the set of propositions which hold in this state
6 enabled: set Operation, -- the set of all operations enabled at this state
7 nextState: enabled -> one PState, -- the next-state function
8 ample: BState -> set Operation -- ample(q,s)
9 }{ all s: BState | ample[s] in enabled } -- ample sets subsets of enabled
10 fun amp[q: PState, s: BState] : set Operation { q.ample[s] }
11 one sig Pinit extends PState {} -- initial program state
12 fact { -- all program states are reachable from Pinit
13 let r = {q, q': PState | some op: Operation | q.nextState[op]=q'} |
14 PState = Pinit.*r
15 }
16 sig ProdState { -- state in the product of program and property automaton
17 pstate: PState, -- the program state component
18 bstate: BState, -- the property state component
19 nextFull: set ProdState, -- all next states in the full product space
20 nextReduced: set ProdState -- all next states in the reduced product space
21 }
22 one sig ProdInit extends ProdState {} -- initial product state
23 pred transitionInProduct[q,q': PState, op: Operation, s,s': BState] {
24 q->op->q' in nextState
25 some t : BTrans | t.src = s and t.dest = s' and t.label = q.label
26 }
27 pred nextProd[x: ProdState, op: Operation, x': ProdState] {
28 transitionInProduct[x.pstate, x'.pstate, op, x.bstate, x'.bstate]
29 }
30 pred independent[op1, op2 : Operation] {
31 all q: PState | (op1+op2 in q.enabled) implies (
32 op2 in q.nextState[op1].enabled and
33 op1 in q.nextState[op2].enabled and
34 q.nextState[op1].nextState[op2] = q.nextState[op2].nextState[op1])
35 }
36 pred invisible[op: Operation] {
```

```
37 all q: PState | op in q.enabled => q.nextState[op].label = q.label
38 }
39 fact C0 { all q: PState, s: BState | some q.enabled => some amp[q,s] }
40 fact C1 {
41 all q: PState, s: BState | let A=amp[q,s] |
42 let r = { q1, q2: PState | some op: Operation-A |
43 q1->op->q2 in nextState } |
44 all q': q.*r, op1: q'.enabled-A, op2: A | independent[op1, op2]
45 }
46 fact C2 {
47 all q: PState, s: BState | let A = amp[q,s] |
48 A != q.enabled implies all op: A | invisible[op]
49 }
50 fact C3' {
51 let r = { x, x' : ProdState | x->x' in nextReduced and
52 amp[x.pstate, x'.bstate] != x.pstate.enabled } |
53 no x: ProdState | x in x.^r
54 }
55 fact { -- generate all reachable product states, etc.
56 nextFull = {x,y: ProdState | some op: Operation | nextProd[x,op,y]}
57 nextReduced = {x,y: ProdState |
58 some op: amp[x.pstate, y.bstate] | nextProd[x,op,y]}
59 ProdState = ProdInit.*nextFull
60 all x,y: ProdState | (x.pstate=y.pstate && x.bstate=y.bstate) => x=y
61 ProdInit.pstate = Pinit and ProdInit.bstate = Binit
62 all x: ProdState, op: Operation, q': PState, s': BState |
63 transitionInProduct[x.pstate, q', op, x.bstate, s'] implies
64 some y: ProdState | y.pstate = q' and y.bstate = s'
65 }
66 pred nonemptyLang[r: ProdState->ProdState] { -- r reaches accepting cycle
67 some x: ProdInit.*r | (x.bstate in AState and x in x.^r)
68 }
69 assert PORsoundness { -- if full space has a lasso, so does the reduced
70 nonemptyLang[nextFull] => nonemptyLang[nextReduced]
71 }
```
The facts are constraints that any instance must satisfy; some of the facts are given names for readability. A pred declaration defines a (typed) predicate.

Most aspects of this model are self-explanatory; I will comment only on the less obvious features. The relations nextFull and nextReduced represent the next state relations in the full and reduced spaces, respectively. They are declared in ProdState, but specified completely in the final fact on lines 56–58. Strictly speaking, one could remove those predicates and substitute their definitions, but this seemed more convenient. Line 60 asserts that a product state is determined uniquely by its program and property components. Line 61 specifies the initial product state.

Line 59 insists that only states reachable (in the full space) from the initial state will be included in an instance (\* is the reflexive transitive closure operator). Lines 62–64 specify the converse. Hence in any instance of this model, ProdState will consist of exactly the reachable product states in the full space.

The encoding of **C1** is based on the following observation: given q ∈ Q and a set A of operations enabled at q, define r ⊆ Q × Q by removing from the program's next-state relation all edges labeled by operations in A. Then "no operation dependent on an operation in A can occur unless an operation in A occurs first" is equivalent to the statement that on any path from q using edges in r, all enabled operations encountered will either be in A or independent of every operation in A.

Condition **C3** is difficult to encode, in that it depends on specifying a depthfirst search. I have replaced it with a weaker condition, which is similar to a well-known cycle proviso in the offline theory:

**C3** In any cycle in reduced(A, amp), there is a transition from q, s to q , s for which amp(q, s ) = en(q).

Equivalently: if one removes from the reduced product space all such transitions, then the resulting graph should have no cycles. This is the meaning of lines 50–54 (^ is the strict transitive closure operator).

The next step is to create tests for specific property automata. This example is for the automaton B<sup>1</sup> of Fig. 1:

```
1 module ba1
2 open ba
3 one sig X0, X1 extends Sigma {}
4 one sig B1 extends BState {}
5 one sig T1, T2, T3 extends BTrans {}
6 fact {
7 AState = B1 -- B1 is the sole accepting state
8 T1.src=Binit && T1.label=X0 && T1.dest=Binit
9 T2.src=Binit && T2.label=X0 && T2.dest=B1
10 T3.src=B1 && T3.label=X1 && T3.dest=B1
11 }
```
The final step is a test that combines the modules above:

```
1 open por_v0
2 open ba1
3 checkPORsoundness for exactly 2 Sigma, exactly 2 BState,
4 exactly 3 BTrans, 2 Operation, 2 PState, 4 ProdState
```
It places upper bounds on the numbers of operations, program states, and product states while checking the soundness assertion. Using the Alloy analyzer to check the assertion above results in a counterexample like the one in Fig. 1. The runtime is a fraction of a second. The Alloy instance uses two uninterpreted atoms for the elements of Sigma; I have simply substituted the sets <sup>∅</sup> and {p} for them to produce Fig. 1. As we have seen, this counterexample happens to also satisfy the stronger constraint **C3**.

#### **5 Spin**

The POR algorithm used by Spin is described in [10] and is similar to the combined algorithm. We can see what Spin actually does by encoding examples in Promela and executing Spin with and without POR.

```
bit p = 0;
active proctype p0() { p=1 }
active proctype p1() { bit x=0; do :: x=0 od }
never {
  B0: do :: !p :: !p -> break od
  accept_B1: do :: p od
}
```
**Fig. 2.** Promela representation of counterexample using *B*<sup>1</sup> of Fig. 1

Figure 2 shows an encoding of the example of Fig. 1. Transition α corresponds to the assignment x=0, where x is a variable local to p1. Transition β corresponds to the assignment p=1, where p is a shared variable. Applying Spin with the following commands allows one to see the structure of the program graphs for each process, as well as each step in the search of the full space:

spin -a test1.pml; cc -o pan -DCHECK -DNOREDUCE pan.c; ./pan -d; ./pan -a

I did this with Spin version 6.4.9, the latest stable release. The output indicates that 4 states and 5 transitions are explored, and one state is matched—exactly as in Fig. 1 (right). As expected, the output also reports a violation—a path to an accepting cycle that corresponds to the transition from A0 to B1 followed by the self-loop on B1 repeated forever.

Repeat this experiment without the -DNOREDUCE, however, and Spin finds no errors. The output indicates that it misses the transition from A0 to B1.

#### **6 Ignoring the Intermediate States**

An interesting aspect of the combined algorithm is that the ample set is a function of an intermediate state. I.e., given a product state x = q, s, the ample set is determined by the intermediate state x = q, s obtained after executing a property transition. This introduces a difference between the on-the-fly scheme and offline schemes, where there is no notion of intermediate state. It also introduces other complexities. For example, it is possible that x was reached earlier in the search through some other state q, s2, because of a property transition s2 L(q) −−−→ s . How does the algorithm guarantee that the ample set selected for x will be the same as the earlier choice? This issue is not addressed in [13] or [10].

These problems go away if one simply makes the ample set a function of the source product state x. The intermediate states do not have to play a role. Specifically, given an ample selector amp, define reduced2(A, amp) as in (1) and (2), except replace "α ∈ amp(q, s )" in (2) with "α ∈ amp(q, s)". Perform the same substitution in **C3** and call the resulting condition **C3**1. The weaker version of **C3**<sup>1</sup> is simply:

**C3** <sup>1</sup> In any cycle in reduced2(A, amp) there is a state q, s with amp(q, s) = en(q).

Conditions **C0**–**C2** are unchanged. I refer to this scheme as V1, and to the original combined algorithm as V0. The Alloy model of V0 in Sect. 4 can be easily modified to represent V1.

Using V1, the example of Fig. 1 is no longer a counterexample. In fact, Alloy reports there are no counterexamples using B1, at least for small bounds on the program size. Figure 5 gives detailed results for this and other Alloy experiments.

Unfortunately, Alloy does find a counterexample for a slightly more complicated property automaton, B2, which is shown in Fig. 3.

**Fig. 3.** Counterexample to V1 with *B*<sup>2</sup> (center). A0 and A2 have proper ample set *{*α*}*.

The program is the same as the one in Sect. 3.1. Automaton B<sup>2</sup> has four states, with state 3 the sole accepting state. The language is the same as that of <sup>B</sup>1: all infinite words formed by concatenating a finite nonempty prefix of <sup>∅</sup><sup>s</sup> and an infinite sequence of {p}s. If the prefix has odd length, the accepting run begins with the transition 0 → 1, otherwise it begins with the transition 0 → 2.

In the ample selector, only A0 and A2 are not fully enabled:

$$
\begin{array}{c|cccc}
\mathsf{amp} & 0 & 1 & 2 & 3 \\
\hline
A & \{\alpha\} \,\{\alpha,\beta\} \,\{\alpha\} \,\{\alpha,\beta\} & \\
B & \{\alpha\} \,\{\alpha\} \,\{\alpha\} \,\{\alpha\} \,\{\alpha\} & \\
\end{array}
$$

**C0**–**C2** hold for the reasons given in Sect. 3.1. **C3**<sup>1</sup> holds for any DFS in which A2 is pushed onto the stack before A1. In that case, there is no back edge from A2; there will be a back edge when A1 is pushed, but A1 is fully enabled.

#### **7 What's Right**

In this section, I show that POR scheme V1 of Sect. 6 is sound if one introduces certain assumptions on the property automaton. The following definition is similar to the notion of *stutter invariant (SI) automaton* in [6] and to that of *closure under stuttering* in [9]. The main differences derive from the use of Muller automata in [6] and *B¨uchi transition systems* in [9], while we are dealing with ordinary B¨uchi automata.

**Definition 12.** *A B¨uchi automaton* B = -S, {s*init*}, Σ, δ, F*, is in* SI normal form *if it has a single initial state* s*init with no incoming edges, and for each* s ∈ S \ {s*init*}*, there is some* a<sup>s</sup> ∈ Σ *such that the following all hold:*


**Lemma 1.** *Let* B *be a B¨uchi automaton in SI normal form. Suppose* a, b ∈ Σ *and* a = b*. Both of the following hold:*


Following the approach of [6], one can show that the language of an automaton in SI normal form is stutter-invariant. Moreover, any B¨uchi automaton with a stutter-invariant language can be transformed into SI normal form without changing the language. The conversion satisfies |S | ≤ O(|Σ||S|), where |S| and |S | are the number of states in the original and new automaton, respectively. For details and proofs, see [17]. An example is given in Fig. 4; the language of B<sup>3</sup> (or B4) consists of all words with a finite number of {p}s.

**Fig. 4.** Property automaton *B*<sup>3</sup> and result of transformation to SI normal form, *B*4.

**Theorem 1.** *Suppose* <sup>B</sup> *is in SI normal form and* amp: <sup>Q</sup>×<sup>S</sup> <sup>→</sup> <sup>2</sup><sup>T</sup> *is an ample selector satisfying* C0*–*C2 *and* C3 <sup>1</sup>*. Then* amp *is POR-sound.*

The remainder of this section is devoted to the proof of Theorem 1. The proof is similar to the proof of the offline case in [4].

Let θ be an accepting path in the full space A. An infinite sequence of accepting paths π0, π1,... will be constructed, where π<sup>0</sup> = θ. For each i ≥ 0, π<sup>i</sup> will be decomposed as η<sup>i</sup> ◦θi, where η<sup>i</sup> is a finite path of length i in the *reduced space*, θ<sup>i</sup> is an infinite path, η<sup>i</sup> is a prefix of ηi+1, and ◦ denotes concatenation. For i = 0, η<sup>0</sup> is empty and θ<sup>0</sup> = θ.

Assume i ≥ 0 and we have defined η<sup>j</sup> and θ<sup>j</sup> for j ≤ i. Write

$$\theta\_i \quad = \ \langle q\_0, s\_0 \rangle \xrightarrow{\langle \alpha\_1, \sigma\_0 \rangle} \langle q\_1, s\_1 \rangle \xrightarrow{\langle \alpha\_2, \sigma\_1 \rangle} \cdots \tag{3}$$

where σ<sup>k</sup> = L(qk) for k ≥ 0. Then ηi+1 and θi+1 are defined as follows. Let A = amp(q0, s0). There are two cases:

*Case 1:* α<sup>1</sup> ∈ A. Let ηi+1 be the path obtained by appending the first transition of θ<sup>i</sup> to ηi, and θi+1 the path obtained by removing the first transition from θi.

*Case 2:* α<sup>1</sup> ∈ A. Then there are two sub-cases:

*Case 2a:* Some operation in A occurs in θi. Let n be the index of the first occurrence, so that α<sup>n</sup> ∈ A, but α<sup>j</sup> ∈ A for 1 ≤ j<n. By **C1**, α<sup>j</sup> and α<sup>n</sup> are independent for 1 ≤ j<n. By repeated application of the independence property, there are paths in P

$$\begin{array}{c} q\_0 \xrightarrow{\alpha\_1} q\_1 \xrightarrow{\alpha\_2} q\_2 \xrightarrow{\alpha\_3} \cdots \xrightarrow{\alpha\_{n-2}} q\_{n-2} \xrightarrow{\alpha\_{n-1}} q\_{n-1} \\ \downarrow \alpha\_n \quad \downarrow \alpha\_n \quad \downarrow \alpha\_2 \xrightarrow{\alpha\_2} q\_3 \xrightarrow{\alpha\_3} \cdots \xrightarrow{\alpha\_{n-2}} q\_{n-1}^\prime \xrightarrow{\alpha\_{n-1}} q\_n \xrightarrow{\alpha\_{n-1}} q\_n \xrightarrow{\alpha\_{n+1}} q\_n \xrightarrow{\alpha\_{n+1}} \cdots \cdots \end{array}$$

By **C2**, α<sup>n</sup> is invisible, whence L(q <sup>j</sup>+1) = σ<sup>j</sup> for 0 ≤ j ≤ n − 2, and σ<sup>n</sup>−<sup>1</sup> = σn. Hence the admissible sequence

$$q\_0 \stackrel{\alpha\_n}{\rightarrow} q\_1' \stackrel{\alpha\_1}{\rightarrow} q\_2' \stackrel{\alpha\_2}{\rightarrow} q\_3' \rightarrow \cdots \stackrel{\alpha\_{n-2}}{\rightarrow} q\_{n-1}' \stackrel{\alpha\_{n-1}}{\rightarrow} q\_n \stackrel{\alpha\_{n+1}}{\rightarrow} q\_{n+1} \stackrel{\alpha\_{n+2}}{\rightarrow} q\_{n+2} \rightarrow \cdots \tag{4}$$

generates the word

$$
\sigma\_0 \sigma\_0 \sigma\_1 \sigma\_2 \cdots \sigma\_{n-2} \sigma\_n \sigma\_{n+1} \sigma\_{n+2} \cdots \ . \tag{5}
$$

Now the projection of θ<sup>i</sup> onto B has the form

$$s\_0 \xrightarrow{\sigma\_0} s\_1 \xrightarrow{\sigma\_1} s\_2 \xrightarrow{\sigma\_2} \cdots \xrightarrow{\sigma\_{n-2}} s\_{n-1} \xrightarrow{\sigma\_n} s\_n \xrightarrow{\sigma\_n} s\_{n+1} \xrightarrow{\sigma\_{n+1}} s\_{n+2} \xrightarrow{\sigma\_{n+2}} \cdots$$

since σ<sup>n</sup>−<sup>1</sup> = σn. By Lemma 1, there is a path in B

$$s\_0 \xrightarrow{\sigma\_0} s\_1 \xrightarrow{\sigma\_0} s\_1' \xrightarrow{\sigma\_1} s\_2 \xrightarrow{\sigma\_2} \cdots \xrightarrow{\sigma\_{n-2}} s\_{n-1} \xrightarrow{\sigma\_n} s\_n \xrightarrow{\sigma\_{n+1}} s\_{n+2} \xrightarrow{\sigma\_{n+2}} \cdots \tag{6}$$

which accepts the word (5). Composing (4) and (6) therefore gives a path through the product space. Removing the first transition (labeled αn, σ0) from this path yields θi+1. Appending that transition to η<sup>i</sup> yields ηi+1.

*Case 2b:* No operation in A occurs in θi. By **C0**, A is nonempty. Let β ∈ A. By **C2**, every operation in θ<sup>i</sup> is independent of β. With an argument that is similar to the one for Case 2a, we can see there is a path in the product space for which the projection onto the program component has the form

$$q\_0 \xrightarrow{\beta} q\_1' \xrightarrow{\alpha\_1} q\_2' \xrightarrow{\alpha\_2} q\_3' \to \cdots$$

and the projection onto the property component has the form

$$s\_0 \xrightarrow{\sigma\_0} s\_1 \xrightarrow{\sigma\_0} s\_1' \xrightarrow{\sigma\_1} s\_2 \xrightarrow{\sigma\_2} \cdots \xrightarrow{\sigma\_2} \cdots$$

Removing the first transition from this path yields θi+1. Appending that transition to η<sup>i</sup> yields ηi+1. This completes the definitions of ηi+1 and θi+1.

Let η be the limit of the ηi. Clearly η is an infinite path through the reduced product space, starting from the initial state. We must show that it passes through an accepting state infinitely often. To do so, we must examine more closely the sequence of property states through which each θ<sup>i</sup> passes.

Let i ≥ 0, and s<sup>0</sup> the final state of ηi. Say θ<sup>i</sup> passes through states s0s1s<sup>2</sup> ··· . Then the final state of ηi+1 will be s1, and the state sequence of θi+1 is determined by the three cases as follows:

$$\begin{array}{ll}\text{Case 1: } s\_1 s\_2 \cdots \\\text{Case 2a: } s\_1 s\_1' s\_2 \cdots s\_n s\_{n+2} \cdots \\\text{Case 2b: } s\_1 s\_1' s\_2 \cdots \end{array} \qquad \begin{array}{ll} (s\_{n+1} \in F \implies s\_n \in F) \\\end{array} \tag{7}$$

We first claim that for all i ≥ 0, θ<sup>i</sup> passes through an accepting state infinitely often. This holds for θ0, which is an accepting path by assumption. Assume it holds for θi. In each case of (7), we see that the state sequence of θi+1 has a suffix which is a suffix of the state sequence of θi, so the claim holds for θi+1.

**Definition 13.** *For any path* ξ = s<sup>0</sup> → s<sup>1</sup> →··· *through* B *which passes through an accepting state infinitely often, define the* accepting distance of ξ*, written* AD(ξ)*, to be the minimum* k ≥ 1 *for which* s<sup>k</sup> *is accepting.*

**Lemma 2.** *Let* i ≥ 0 *and say the state sequence of* θ<sup>i</sup> *is* s0s1s<sup>2</sup> ··· *. If* s<sup>1</sup> *is not accepting then one of the following holds:*


*Proof.* If s<sup>1</sup> is not accepting then there is some k ≥ 2 for which s<sup>k</sup> is accepting. The result follows by examining (7). In Case 1, the accepting distance decreases by 1. In Case 2a, the accepting distance is either unchanged (if k ≤ n) or decreases by 1 (if k>n). In Case 2b, the accepting distance is unchanged.

## **Lemma 3.** *For an infinite number of* i ≥ 0*, Case 1 holds for* θi*.*

*Proof.* Suppose not. Then there is some i ≥ 0 such that Case 2 holds for all j ≥ i. Let α<sup>1</sup> be the first program operation of θi. Then α<sup>1</sup> is the first program operation of θ<sup>j</sup> , for all j ≥ i. Furthermore, for all j ≥ i, α<sup>1</sup> is not in the ample set of the final state of η<sup>j</sup> . Since the product space has only a finite number of states, this means there is a cycle in the reduced space for which α<sup>1</sup> is enabled but never in the ample set, contradicting **C3** <sup>1</sup>.

We now show that η passes through an accepting state infinitely often. Note that, if AD(θi) = 1, an accepting state is added to η<sup>i</sup> to form ηi+1. Suppose η does not pass through an accepting state infinitely often. Then there is some i ≥ 0 such that for all j ≥ i, AD(θ<sup>j</sup> ) > 1. By Lemma 2, (AD(θ<sup>j</sup> ))<sup>j</sup>≥<sup>i</sup> is a nonincreasing sequence of positive integers, and by Lemma 3, this sequence strictly decreases infinitely often, a contradiction. This completes the proof of Theorem 1.

*Remark 1.* The proof goes through with minor modifications for V0 in place of V1. Let A = amp(q0, s1) instead of amp(q0, s0). In Case 2a (similarly in 2b), note the first transition s<sup>0</sup> <sup>σ</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> in the path in <sup>B</sup> remains in the new path (6).

#### **8 Summary of Experimental Results and Conclusion**

We have seen that standard ways of combining POR and on-the-fly model checking are unsound. This is not only a theoretical issue—the defect in the algorithm is realized in Spin, which can produce an incorrect result. A modification (V1) seems to help, but is still not enough to guarantee soundness for any B¨uchi automaton with a stutter-invariant language. However, any such automaton can be transformed into a normal form for which algorithm V1 is sound.


**Fig. 5.** Bounded verification of soundness of POR schemes V0 and V1 on various B¨uchi automata using Alloy. *B*<sup>5</sup> represents all automata in SI normal form within the bounds. Each run resulted in either a counterexample (✗) or not (✓).

Alloy proved useful for reasoning about the algorithms and generating small counterexamples. A summary of the Alloy experiments and results is given in Fig. 5. These were run on an 8-core 3.7GHz Intel Xeon W-2145 and used the plingeling SAT solver [1].<sup>3</sup> In addition to the experiments already discussed, Alloy found no soundness counterexamples for property automata B<sup>3</sup> or B4, using V0 or V1. In the case of B4, this is what Theorem 1 predicts. For further confirmation of Theorem 1, I constructed a general Alloy model of B¨uchi automata in SI normal form, represented by B<sup>5</sup> in the table. Alloy confirms that both V0 and V1 are sound for all such automata within small bounds on program and automata size.

It is possible that the use of the normal form, while correct, cancels out the benefits of POR. A comprehensive exploration of this issue is beyond the scope of this paper, but I can provide data on one non-trivial example. I encoded an n-process version of Peterson's mutual exclusion algorithm in Promela, and used Spin to verify starvation-freedom for one process in the case n = 5. If p is the predicate that holds whenever the process is enabled, a trace violates this property if p holds only a finite number of times in the trace, i.e., if the trace is in L(B3) = L(B4). Figure 6 shows the results of Spin verification using B<sup>3</sup> without POR, and using B<sup>3</sup> and B<sup>4</sup> with POR. The results indicate that POR significantly improves performance on this problem, and that using the normal form B<sup>4</sup> in place of B<sup>3</sup> actually *improves* performance further by a small amount.


**Fig. 6.** Spin verification of starvation-freedom for 5-process Peterson. Using the SI normal form *B*<sup>4</sup> instead of the smaller *B*<sup>3</sup> has little impact on performance.

It is likely that V1 is sound for other interesting classes of automata. Observe, for example, that B<sup>2</sup> of Fig. 3 has states u where the language of the automaton with u considered as the initial state is *not* stutter-invariant. If we restrict to automata in which every state has a stutter-invariant language, is V1 sound? I have neither a proof nor a counterexample. (This is certainly not true of V0, as B<sup>1</sup> is a counterexample.) To explore this question, it would help to find a way to encode the stutter-invariant property—or a suitable approximation—in Alloy.

Finally, the proof of Theorem 1 is complicated and might also be flawed. Recent work mechanizing such proofs [3] represents an important advance in raising the level of assurance in model checking algorithms. It would be interesting to see if the proof of this theorem is amenable to such methods. However, constructing such proofs requires far more effort than the Alloy approach described here. One possible approach moving forward is to use tools such as Alloy when prototyping a new algorithm, to get feedback quickly and root out

<sup>3</sup> All artifacts needed to reproduce the experiments reported in this paper can be downloaded from http://vsl.cis.udel.edu/cav19.

bugs. Once Alloy no longer finds any counterexamples, one could then expend the considerable effort required to construct a formal mechanized proof.

**Acknowledgements.** I am grateful to Ganesh Gopalakrishnan and Julian Brunner for fruitful conversations on partial order reduction, to Gerard Holzmann for help with Spin, and to the anonymous reviewers for suggestions that improved this paper. This material is based upon work by the RAPIDS Institute, supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. Funding was also provided by the U.S. National Science Foundation under award CCF-1319571.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Integrating Formal Schedulability Analysis into a Verified OS Kernel**

Xiaojie Guo1,2, Maxime Lesourd1,2, Mengqi Liu<sup>3</sup>, Lionel Rieg1,3(B), and Zhong Shao<sup>3</sup>

<sup>1</sup> Univ. Grenoble Alpes, CNRS, Grenoble INP, VERIMAG, Grenoble, France <sup>2</sup> Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, Grenoble, France <sup>3</sup> Yale University, New Haven, CT, USA lionel.rieg@univ-grenoble-alpes.fr

**Abstract.** Formal verification of real-time systems is attractive because these systems often perform critical operations. Unlike non real-time systems, latency and response time guarantees are of critical importance in this setting, as much as functional correctness. Nevertheless, formal verification of real-time OSes usually stops the scheduling analysis at the policy level: they only prove that the scheduler (or its abstract model) satisfies some scheduling policy. In this paper, we go further and connect together Prosa, a verified schedulability analyzer, and RT-CertiKOS, a verified single-core sequential real-time OS kernel. Thus, we get a more general and extensible schedulability analysis proof for RT-CertiKOS, as well a concrete implementation validating Prosa models. It also showcases that it is realistic to connect two completely independent formal developments in a proof assistant.

**Keywords:** Formal methods · Proof assistant · Real-time scheduling · OS kernel · Schedulability analysis

#### **1 Introduction**

The real-time and OS communities have seen recent effort towards formal proofs, through several techniques such as model checking [16,22] and interactive theorem provers [7,14,17]. This trend is motivated by the high stakes of critical systems and the combinatorial complexity of considering all possible interleavings of states of a system, which makes pen-and-paper reasoning too error-prone.

Real-time OSes used in critical areas such as avionics and automobile applications must ensure not only functional correctness but also timing requirements. Indeed, a missed deadline may have catastrophic consequences. Schedulability analysis aims to guarantee the absence of deadline miss given a scheduling algorithm which decides which task is going to execute.

In the current state of the art, the schedulability analysis is decoupled from the kernel code verification. This is good from a separation of concern perspective as both kernel verification and schedulability analysis are already complex enough without adding in the other. Nevertheless, this gap also means that both communities may lack validation from the other one.

On the one hand, schedulability analysis itself is error-prone, *e.g.,* a flaw was found in the original schedulability analysis [26,27,29] for the Controller Area Network bus, which is widely used in automobile. To tackle this issue, the Prosa library [7] provides mechanized schedulability proofs. This library is developed with a focus on readable specifications in order to ensure wide acceptance by the community. It is currently a reference for mechanized schedulability proofs and was able to verify several existing multicore scheduling policies under a new setting with jitter. However, some of its design decisions, in particular for task models and scheduling policies, are highly unusual and their adequacy to reality has never been justified by connecting them to a concrete OS kernel enforcing a real-time scheduling policy.

On the other hand, OS kernels are very sensitive and bug-prone pieces of code, which inspires a lot of existing work on using formal methods to prove functional correctness and other requirements, such as access control policies [17], scheduling policies [31], timing requirements, etc. One such verified OS kernel is RT-CertiKOS [21], developed by the Yale FLINT group and built on top of the sequential CertiKOS [9,13]. Its verification focuses on extensions beyond pure functional correctness, such as real-time guarantees and isolation between components. However, any major extension such as real-time adds a lot of proof burden.

In this paper, we solve both problems at once by combining the formal schedulability analysis given by Prosa with the functional correctness guarantees of RT-CertiKOS. Thus, we get a formal schedulability proof for this kernel: if it accepts a task set, then formal proofs ensure that there will be no deadline miss during execution. Furthermore, this work also produces a concrete instance of the definitions used in Prosa, ensuring their consistency and adequacy with a real system.

**Contributions.** In this paper, we make the following contributions:


**Outline of the Paper.** Section 2 introduces the Prosa library and its description of scheduling. In Sect. 3, we describe RT-CertiKOS, its scheduler, as well as the associated verification technique, abstraction layers. Section 4 then highlights the key differences between the models of Prosa and RT-CertiKOS, and how we resolve them. Finally, Sects. 5, 6, and 7, evaluate our work, present future work and related work before concluding.

#### **2 Prosa**

Prosa [7] is a Coq [25] library of models and analyses for real-time systems. The library is aimed towards the real-time community and provides models and analyses found in the literature with a focus on readable specifications.

**Fig. 1.** An overview of Prosa layers

The library contains four basic layers, which are presented in Fig. 1:


#### **2.1 System Behavior**

The basic definitions in Prosa concern concrete system behavior. The notion of time used in the library corresponds to scheduling ticks: durations are given in number of ticks and instants are given as number of ticks from initialization of the system. For this paper, we focus on single-core systems<sup>1</sup> on which instances of a finite set T askSet of tasks are scheduled. To each task τ is associated a relative deadline D<sup>τ</sup> which corresponds to the delay we want to guarantee between the activation of an instance of a task and its completion. We defer the definition of tasks (Definition 4) until their parameters are relevant and focus first on the modeling of system behavior in Prosa. The instances of tasks which are to be scheduled are called *jobs*.

**Definition 1 (Job).** *A job* j *is defined by a* task τj*, a positive cost* **c**j*, and a unique identifier.*

We do not use the identifier directly, it is only used to distinguish jobs of the same task in traces.

These jobs are used to describe the workload to be scheduled. This workload is defined by an *arrival sequence* which is a trace of job activations.

**Definition 2 (Arrival sequence).** *An* arrival sequence *is a function* ρ *mapping any time instant* t *to a finite (possibly empty) set of jobs* ρ(t)*.*

*A job can only appear once in an arrival sequence.*

Since a job j can appear at most once in an arrival sequence ρ, we can define its *arrival time* **<sup>a</sup>**ρ(j) in <sup>ρ</sup> as the instant <sup>t</sup> such that <sup>j</sup> <sup>∈</sup> <sup>ρ</sup>(t).

We do not model the scheduler as a function, instead we work with *schedules* over an arrival sequence which are traces of scheduled jobs.

**Definition 3 (Schedule).** *A* schedule *over an arrival sequence* ρ *is a function* <sup>σ</sup> *which maps any time instant* <sup>t</sup> *to either a job appearing in* <sup>ρ</sup> *or* <sup>⊥</sup>*.*

The symbol ⊥ is used for instants at which no job is scheduled. Given an arrival sequence <sup>ρ</sup> and a schedule <sup>σ</sup> over <sup>ρ</sup>, a job <sup>j</sup> <sup>∈</sup> <sup>ρ</sup> is said to be *scheduled at an instant* t if σ(t) = j, the *service* received by j up to time t is the number of instants before t at which j is scheduled. A job j is said to be complete at time t if its service received up to time t is equal to its cost **c**<sup>j</sup> and j is said to be *pending at time* t if it has arrived before time t and is not complete at time t. From now on, we require schedules to only schedule pending jobs. A job j is said to be schedulable if it is complete by its *absolute deadline* d<sup>j</sup> := **a**ρ(j) + Dτ j.

<sup>1</sup> Multicore systems are handled by Prosa but we do not consider them here.

#### **2.2 System Model**

**Task Model.** In order to specify the behavior of the system we are interested in, Prosa introduces predicates on traces for which the response time analysis provides guarantees.

We now focus on the definitions related to the *sporadic* task model and the *fixed priority preemptive* (FPP) scheduling policy.

**Definition 4 (Sporadic FPP task).** *A sporadic FPP task* τ *is defined by a deadline* <sup>D</sup><sup>τ</sup> <sup>∈</sup> <sup>N</sup>*, a minimal inter-arrival time* <sup>δ</sup><sup>−</sup> <sup>τ</sup> <sup>∈</sup> <sup>N</sup>*, a worst case execution time (WCET)* **<sup>C</sup>**<sup>τ</sup> *, and a priority* <sup>p</sup><sup>τ</sup> <sup>∈</sup> <sup>N</sup>*. When* <sup>D</sup><sup>τ</sup> *is equal to* <sup>δ</sup><sup>−</sup> <sup>τ</sup> *, the deadline is said* implicit*.*

**Sporadic Task Model.** The sporadic task model is specified by a sporadic arrival model and a cost model.

In the sporadic arrival model, consecutive activations of a task τ are separated by a minimum distance δ<sup>−</sup> <sup>τ</sup> : an arrival sequence ρ is sporadic if for any two distinct jobs <sup>j</sup>1, <sup>j</sup><sup>2</sup> <sup>∈</sup> <sup>ρ</sup> of the same task <sup>τ</sup> , <sup>|</sup>**a**ρ(j1) <sup>−</sup> **<sup>a</sup>**ρ(j2)| ≥ <sup>δ</sup><sup>−</sup> <sup>τ</sup> . Periodic arrivals are a particular case of this model where δ<sup>−</sup> <sup>τ</sup> is the period and jobs arrives exactly at intervals of δ<sup>−</sup> <sup>τ</sup> . This is sufficient for us as the schedulability analysis for FPP yields the same bounds for sporadic and periodic activations.

The considered cost model is a constraint on activations: jobs in the arrival sequence must respect the WCET of their task, that is, for any <sup>j</sup> <sup>∈</sup> <sup>ρ</sup>, **<sup>c</sup>**<sup>j</sup> <sup>≤</sup> **<sup>C</sup>**<sup>τ</sup><sup>j</sup> .

**FPP Scheduling Policy.** The FPP policy is modeled in Prosa as two constraints on the schedule: it must be *work conserving*, that is, it cannot be idle when there are pending tasks; and it must respect the priorities, that is, a scheduled job always has the highest priority among pending jobs.

#### **2.3 Analysis**

Prosa contains a proof of Bertogna and Cirinei's [4] response time analysis for FPP single-core schedules of sporadic tasks, with exact bounds for implicit deadlines. The analysis is based on the following property of the maximum workload for these schedules.

**Definition 5 (Maximum Workload).** *Given a task* <sup>τ</sup> <sup>∈</sup> T askSet *and a duration* Δ*, the maximum workload of the system* w.r.t. τ *within that duration is*

$$W\_{\tau}(\Delta) := \sum\_{\substack{\tau' \in TaskSet \\ p\_{\tau'} \ge p\_{\tau}}} \mathbf{C}\_{\tau'} \times \left[ \frac{\Delta}{\delta\_{\tau'}^{-}} \right]$$

The maximum workload W<sup>τ</sup> (Δ) corresponds to the worst case activation pattern in which all tasks are simultaneously activated with maximum cost (WCET of their task) and minimal inter-arrival distance. It is an upper bound on the amount of service required to schedule activations of the tasks with a priority higher than or equal to p<sup>τ</sup> in any interval of size Δ. Based on this property, we can derive a response time bound for our system model if we can find a Δ larger than W<sup>τ</sup> (Δ).

**Theorem 1 (Response Time Bound).** *Given a sporadic taskset* T askSet *and a task* <sup>τ</sup> <sup>∈</sup> T askSet *then for any* R > <sup>0</sup> *such that* <sup>R</sup> <sup>≥</sup> <sup>W</sup><sup>τ</sup> (R)*, any job* j *of task* τ *in an FPP schedule* σ *over an arrival sequence* ρ *is completed by* **a**ρ(j) + R*.*

For instance, the smallest response time bound for a task <sup>τ</sup> <sup>∈</sup> T askSet can be computed by the least positive fixed point of the function W<sup>τ</sup> . Using this response time bound, we can derive a *schedulability criterion* by requiring this bound to be smaller than or equal to the deadline of task τ .

#### **2.4 Implementation and Motivation for the Connection with RT-CertiKOS**

The Prosa library includes functions to generate periodic traces and the corresponding FPP schedules, together with proofs of these properties and an instantiation of the schedulability criterion for these traces. This implementation was initially provided as a way to check that the modeling of the arrival model and scheduling policy are not contradictory and as such the implementation is as simple as possible. Although this is a good step in order to make the axiomatic definition of scheduling policies more acceptable, there is still room for improvement: these implementations are still rather ad-hoc and there is no connection to an actual system. This is where the link with RT-CertiKOS is beneficial to the Prosa ecosystem: it justifies that the model is indeed suitable for a concrete and independently developed real-time OS scheduler.

#### **3 The RT-CertiKOS OS Kernel**

RT-CertiKOS [21], developed by the Yale FLINT group, is a real-time extension of the single-core sequential CertiKOS [9,13],<sup>2</sup> whose functional correctness has been mechanized in the Coq proof assistant [25]. The sequential restriction greatly simplifies the implementation of the OS kernel. However, it does not support multicore, and the lack of kernel preemption can also degrade the responsiveness of the whole system. RT-CertiKOS proves spatial and temporal isolation (including schedulability) between components.

Both CertiKOS and RT-CertiKOS follow the same proof methodology, organized around the notion of abstraction layers that permits decomposition of the kernel into small pieces that are easier to verify.

<sup>2</sup> There is a multicore version of CertiKOS [14,15], but RT-CertiKOS is developed on top of the sequential version.

#### **3.1 Abstraction Layers**

*Abstraction layers* [13] are essentially a way to combine code fragments and their interface with simulation proofs. They consist of four elements: *(a)* a piece of code; *(b)* an *underlay*, the interface that the code relies on; *(c)* an *overlay*, the interface that the code provides; *(d)* a *simulation proof* ensuring that the code running on top of the underlay indeed provides the functionalities described in the overlay.

Implementation details of lower layers are encapsulated in higher layers, allowing to reason directly with the specifications rather than the implementation.

Notice that the underlay and overlay are specifications written in Coq and may be expressed using the semantics of several programming languages at once. This explains how CertiKOS (and RT-CertiKOS) manages to encompass both C and assembly code verification into a unified framework. Notice further that this notion of interface not only includes functions but also some *abstract state*, which exposes memory states of lower layers in a clean and structured way, and allows the overlay to access them only by invoking verified functions.

#### **3.2 The Scheduler in RT-CertiKOS**

RT-CertiKOS supports user-level fixed-priority preemptive scheduling. Its scheduler is invoked by timer interrupts periodically, dividing CPU time into intervals, which are called *time slots*, *time quanta*, or *time slices*.

**Task Model.** Each task in RT-CertiKOS is defined by a fixed priority, a period, and a budget (or WCET), the latter two being given in time slot units. Tasks are strictly periodic, with implicit hard deadlines, that is, the deadlines are the start of the next period and no deadline miss is allowed at all. While this is a restricted setting, it is enough to handle closed-loop control, used in control real-time systems. Furthermore, RT-CertiKOS only allows for fixed priorities in order to get maximum predictability, which is of utmost importance in critical systems. Finally, RT-CertiKOS also enforces budgets at the task level: in each period, a task cannot be scheduled for more than its specified budget.

**Fixed-Priority Scheduler.** The RT-CertiKOS scheduler maintains an integer array to keep track of time quantum usage for each task. Upon invocation, the scheduler first iterates over all tasks, replenishing quotas whenever a new period arrives. It then loops again and finds the highest priority task that has not used up its budget, followed by a decrement on the chosen task's current quota. Its abstraction is a Coq function that iterates over an abstract array of task control blocks, updates them, and returns the highest task identifier available for scheduling.

**Yield System Call.** Tasks do not always use up their budgets. A task can yield to relinquish any remaining quota, so that lower priority tasks may be scheduled earlier and more time slots may be dedicated to non real-time tasks.

#### **3.3 Proof Methodology**

Based on sequential CertiKOS, RT-CertiKOS [21] follows the idea of deep specifications<sup>3</sup> in which the specification should be rich enough to deduce any property of interest: there should never be any need to consider the implementation. In particular, even though its source code is written in both C and assembly, the underlay always abstracts the concrete memory states it operates on into abstract states, and abstracts concrete code into Coq functions that act as executable specification. Subsequent layers relying on this underlay will invoke Coq functions instead of the concrete code, thus hiding implementation details.

In the case of scheduling, there are essentially two functions: the scheduler and the yield system call. The scheduler relies on two concrete data structures: a counter tracking the current time (in time slot units) and an array tracking the current quota for each periodic task. The yield system call simply sets the remaining quota of the current task to zero. Both functions are verified in RT-CertiKOS, that is, formals proofs ensure that their C code implementations indeed simulate the corresponding Coq specifications.

#### **3.4 Motivation for the Connection with Prosa**

Upgrading an OS kernel into a real-time one is not an easy task. When one further adds formal proofs about functional correctness, isolation, and timing requirements, the proof burden becomes enormous. In particular, there is still room for future work on RT-CertiKOS, *e.g.,* a WCET analysis of its system calls.

In order to reduce the overall proof burden, it is important to try to delegate as much as possible to specialized libraries and tools. Thus, from the RT-CertiKOS perspective, the benefit of using Prosa is precisely to have stateof-the-art schedulability analyses already mechanized in Coq, without having to prove all these results.

Furthermore, the schedulability check of Prosa is only performed once while verifying the proofs, such that there is no runtime overhead and no loss of performance for RT-CertiKOS.

#### **4 From RT-CertiKOS to Prosa: A Schedule Connection**

Prosa definitions cannot apply to RT-CertiKOS directly. Indeed, the perspectives of Prosa and RT-CertiKOS on the real-time aspects of a system are not the same, which is reflected in the differences in their task models, their executions, and the information they need. In this section, we explain how we bridge these gaps to actually perform the connection. Table 1 summarizes the various definitions and proofs and how they relate to each other.

<sup>3</sup> https://deepspec.org/.


**Table 1.** Summary of the range of the various data between RT-CertiKOS and Prosa

#### **4.1 Interface Between RT-CertiKOS and Prosa**

We design an interface to link RT-CertiKOS and Prosa, focusing on the precise amount of information that needs to be transmitted between them. The interface is shaped by the information Prosa needs to perform the schedulability analysis: a task set and a schedule, together with some properties.

**Key Elements of the Interface.** The task model we consider is the one of RT-CertiKOS, as it is more restrictive than the ones supported by Prosa. Tasks are defined by a priority level p, a period T<sup>p</sup> and a WCET (more accurately a budget) Cp. Since we only allow one task per priority level, we identify tasks and priority levels and we write Cp, Dp, and T<sup>p</sup> instead of C<sup>τ</sup> , D<sup>τ</sup> , and T<sup>τ</sup> . In order for this setting to make sense, we assume the following inequality for each task <sup>p</sup>: 0 < C<sup>p</sup> <sup>≤</sup> <sup>T</sup>p. Notice that this is a particular case of Prosa's FPP task model (Definition 4). There is no definition of the jobs of a task as they can be easily defined from a task and a period number.

The second element Prosa needs is an infinite schedule. RT-CertiKOS cannot provide such an infinite schedule, as only a finite prefix can be known, up to the current time. Thus, we keep RT-CertiKOS's finite schedule as is in the interface and it is up to Prosa to extend it into an infinite one, suitable for its analysis.

Finally, Prosa needs two properties about the schedule: *(a)* any task receives no more service than its WCET in any period; *(b)* the schedule indeed follows the FPP policy. We refer to schedules satisfying these properties as *valid schedule prefixes*. Proving these properties falls to RT-CertiKOS.

**Handling Service and Job Cost.** In RT-CertiKOS, and more generally in any OS, we only assume a bound on the execution time of a task, used as a budget. The exact execution time of each of its jobs is not known beforehand and can be observed only at runtime. On the opposite, Prosa assumes that costs for all jobs of all tasks are part of the problem description and thus are available from the start.

To fix this mismatch, we define a job cost function computed from a schedule prefix: its value is the actual service received if the job has yielded and the WCET of its task otherwise. This definition relies on the computation of service in any period, which we also provide as part of the interface.

#### **4.2 The RT-CertiKOS Side**

**Adding the Schedule in RT-CertiKOS.** RT-CertiKOS only maintains the current state of the system, which the scheduler relies on, such as the current time and quota array. However, the interface requires a schedule trace. We introduce such a ghost variable in RT-CertiKOS, and update a few scheduling-related primitives to extend this trace whenever a task is scheduled.

This introduction adds absolutely no proof overhead, since it does not affect the scheduling decisions, thus existing proofs about the rest of the system still hold. Furthermore, it is a purely logical variable introduced through refinement, meaning that it does not exist in the C code, thus it causes no computation overhead.

**Too Much Information in RT-CertiKOS.** The full RT-CertiKOS model contains too much information compared to what the interface requires.

Firstly, services in RT-CertiKOS may affect a part of the state that is relevant to practical scheduling, but is of no interest to the scheduling model we want to verify, like batch tasks.

Secondly, due to the nature of *deep specification*, the abstraction of the whole scheduling operation contains more information than what is required for reasoning about real-time properties. For example, saving and restoring registers is essential for the correctness of context switches (thus, of the scheduler), but it is irrelevant to temporal properties.

Thirdly, specifications in RT-CertiKOS enumerate preconditions of the scheduler such as the correct configuration of the paging bit in the control register, the validity of the current stack and so on. These are required for other invariants of the kernel at other abstraction levels, but again they are irrelevant to scheduling.

**Simplified Model of RT-CertiKOS.** For all these reasons, we define a simplified scheduling model of RT-CertiKOS, with a much simpler abstract state containing only the data structures that are actually used in scheduling, from which the interface data and its properties must be derived. This simplified abstract state contains four fields:


This abstract state is not equivalent to the complete one, because it operates on a totally different abstract data type where all irrelevant fields are removed. It is also more permissive: more transitions are allowed since it does not perform the sanity checks about preconditions such as being in kernel mode, host mode, etc. Nevertheless, we still have a simulation: any step in the full RT-CertiKOS is also allowed in the simplified version and results in the same scheduling decision and trace. This simulation is enough for our purposes as we are ultimately interested in the behavior of the full RT-CertiKOS.

**Proving the Properties Required by Prosa.** The interface requires two key properties: *(a)* the service received by each job is at most the WCET of its task; and *(b)* the schedule prefix follows FPP. These properties must be proven on the RT-CertiKOS side for any schedule that might be generated. This way, Prosa can rely on them through the interface.

Since RT-CertiKOS verification is based on state invariants rather than traces, we prove these properties using the following main invariants on the simplified scheduling model:


To prove that these statements are indeed invariants, we must prove that they are preserved by any step, that is, by the scheduler (triggered by the user-level timer interrupt) and by the yield system call (triggered by the user process), since all other kernel steps do not modify the scheduling data of the simplified scheduling model.

**Simulation Between the Simplified Scheduling Model and RT-CertiKOS.** To connect the full RT-CertiKOS model and the simplified one, we define a projection function RData proj extracting the relevant fields from the full RT-CertiKOS state to build the simplified one.

As shown in Fig. 2, we prove that given a scheduler transition of RT-CertiKOS between the (full) states d and d , there is also a transition from their projections s and s by invoking the simplified scheduler.<sup>4</sup> If the states d and s satisfy respectively the invariants for RT-CertiKOS and the simplified model, then so do d and s (they are invariants). As the states s and s are projections of d and d , the invariants of s and s also hold on the corresponding fields in d and d . This allows us to utilize the invariants proved in the simplified model to establish properties on the full state of RT-CertiKOS. Notice that the schedulability property we study is a safety property (deadlines are never missed) and not a liveness one (everything is eventually scheduled).

<sup>4</sup> More precisely, we prove that certikos sched(*s*) and *RData proj*(*d*- ) are *extensionally* equal.

**Fig. 2.** Simulation between simplified scheduling and RT-CertiKOS

#### **4.3 The Prosa Side**

**Proven Schedulability Analysis in Prosa.** In order to use the response time bound of Sect. 2, we need to relate any finite schedule prefix from the interface to an arrival sequence and a schedule satisfying the model described in Sect. 2. We can then rely on any schedulability criterion (*e.g.,* the one described at the end of Sect. 2.3) to prove that the response time bound holds and deduce that any valid schedule prefix from the interface is indeed schedulable.

**Bridging the Gap Between the Interface and Prosa.** The interface provides Prosa with a task set, service and job cost functions, and a valid schedule prefix. We first build an arrival sequence from the schedule prefix where the n-th job (n > 0) for a given task <sup>p</sup> arrives at time (<sup>n</sup> <sup>−</sup> 1) <sup>×</sup> <sup>T</sup><sup>p</sup> with the cost given by the interface. Note that jobs that do not arrive within the prefix cannot have yielded yet so that their costs is the WCET of their tasks: we assume the worst case for the future.

The arrival sequence is then defined by adding all jobs of each task p from T askSet, that is, the arrival sequence at time <sup>t</sup> contains the (t/Tp + 1)-th job of p iff t is divisible by Tp.

Next, we need to turn the finite schedule prefix into an infinite one. There are two possibilities: either build a full schedule from the arrival sequence using the Prosa implementation of FPP, or start from the schedule prefix of the interface and extend it into an infinite one. The first technique gives for free the fact that the infinite schedule satisfies the FPP model from Prosa. The difficulty lies in proving that the schedule prefix from the interface is indeed a prefix of this infinite schedule. The second technique starts from the schedule prefix and the difficulty is proving that it satisfies the FPP model as specified on the Prosa side.

In this paper, we use the first strategy and prove that the prefix of the schedule built by Prosa is equal to the schedule prefix provided in the interface. To do so, we use the fact that two FPP schedule prefixes with the same arrival sequence and job costs (only known at runtime) are the same, provided we take care to properly remember when jobs yield.

Assuming that the task set is accepted by the schedulability criterion, we know that the Prosa schedule is schedulable and, since this implies that its prefix is also schedulable, we deduce that the valid schedule prefix given by the interface is schedulable.

#### **5 Evaluation and Future Work**

#### **5.1 Evaluation**

As the C and assembly source code of RT-CertiKOS was not modified at all, this connection does not introduce any overhead to its performance and there is no need for a new performance evaluation. Instead, we focus on the benefits this works brings and on the amount of work involved, described in Table 2.

**Benefits for RT-CertiKOS and Prosa.** The schedulability analysis already present in RT-CertiKOS was manually proved and took around 8k LoC to handle the precise setting described in this paper. By contrast, interfacing with Prosa requires 50% less proofs, is more flexible and can easily be extended (see Sect. 5.3). The introduction of a simplified scheduling model also reduced by 75% the size of proofs of invariants about the high-level abstract scheduler since we are freed from the unnecessary information described in Sect. 4.2.

On the Prosa side, having a complete formal connection with an actual OS kernel developed independently validates the modeling choices made for describing real-time systems. Indeed, seeing schedulers as predicates over scheduling traces is very general but one can legitimately wonder whether such predicates accurately describe reality.

**Proof Effort.** Designing a good interface allowed us to cleanly separate the work required on the RT-CertiKOS and Prosa sides.

On the RT-CertiKOS side, the design of the simplified scheduling setting was pretty straightforward, as was the correctness of the translation. Indeed, this translation is essentially a projection, except for batch tasks which are removed. Designing adequate inductive invariants to prove the two properties required by the interface was the most challenging part of this work and unsurprisingly, it took several iterations to find correct definitions.

On the Prosa side, building the arrival sequence and the infinite schedule is quite effortless given a prefix and a job cost function. The subtle thing was to find a good definition of the job cost function, which made the corresponding proofs significantly easier. Proving that the prefix of the built infinite schedule is the same as the interface prefix *w.r.t.* executions was troublesome for two reasons. First, the interface prefix contains an additional boolean representing whether the scheduled job yielded and which is used for computing job costs, whereas it does not exist in the built schedule. Second, the definition of the FPP property in the interface depends on a schedule prefix, while the one in Prosa depends on an infinite schedule.

Overall, we see the small amount of LoC required to perform this work as a validation of the adequacy of our method to the considered problem.


**Table 2.** Proof effort

#### **5.2 Lessons Learned**

Beyond the particular artifact linking RT-CertiKOS with Prosa, what more general lessons can we learn from this connection?

First, using the same proof assistant greatly helps. Indeed, beyond the absence of technical hassle of inter-operability between different formal tools, it also avoids the pitfall of a formalization mismatch between both formal models and permits sharing common definitions.

Second, the creation of an explicit interface between both tools clearly marks the flow of information, stays focused on the essential information, and delimits the "proof responsibility": which side is responsible for proving which fact. It also segregate the proof techniques used on each side so as not to pollute the other one, either on a technical aspect (vanilla Coq for RT-CertiKOS *vs* the SSReflect extension for Prosa) or on the verification methods used (invariantbased properties for RT-CertiKOS *vs* trace-based properties for Prosa). This separation makes it unnecessary to have people be experts in both tools at once: once the interface was clearly defined, experts on each side could work with only a rough description of the other one, even though this interface required a few later changes. In particular, it is interesting to notice that half the authors are experts in RT-CertiKOS whereas the other half are experts in Prosa.

Third, the common part of the models used by both sides must be amenable to agreement: in our case, this means having the same notion of time (scheduling slots, or ticks) and a compatible notion of schedule (finite and infinite).

Finally, we expect the interface we designed to be reusable for other verified kernels wanting to connect to Prosa or for linking RT-CertiKOS to other formal schedulability analysis tools.

#### **5.3 Future Work**

**Evolving with RT-CertiKOS.** The existing implementation of the scheduler in RT-CertiKOS imposes a fixed priority scheduling policy with implicit deadlines. In the future, as RT-CertiKOS evolves and supports more task models, the interface connecting it with Prosa should also extend.

A straightforward extension is to allow *constrained deadlines*, that is, to have the deadline D<sup>p</sup> be shorter than the period T<sup>p</sup> (but greater than the WCET Cp) as the schedulability result we use from Prosa already supports it. This requires RT-CertiKOS to support an extended task model where a task is also specified by its deadline. Furthermore, RT-CertiKOS would also need to enforce budget at the deadlines, instead of at the beginning of the next period as it is currently the case.

Another extension would be to consider the Earliest Deadline First (EDF) scheduling policy which provides better utilization ratio. In addition to relaxing the current task model by not including priorities, the main proof effort would be to implement and verify this new scheduler in RT-CertiKOS.

**Extensions to Prosa.** Our experience connecting RT-CertiKOS and Prosa shows that Prosa's assumption of having an infinite schedule is quite impractical when verifying instances of real-time systems. This advocates for building reusable connections between Prosa's system model based on infinite traces and a model similar to the one used in the interface with RT-CertiKOS. Thus, one would prove analyses in the convenient setting of infinite traces and still be able to apply them to lower level models of real-time systems with finite traces.

#### **6 Related Work**

**Schedulability Analysis.** Schedulability analysis as a key theory in the realtime community has been widely studied in the past decades. Liu and Layland's seminal work [20] presents a schedulability analysis technique for a simple system model described as a set of assumptions. Many later work [3,5,11,23,28] aim to capture more *realistic*<sup>5</sup> and complex system models by generalizing those assumptions.

In order to provide formal guarantees to those results, several formal approaches have been used for the formalism of schedulability analyses, such as model checking [8,12,16], temporal logic [32,33], and theorem proving [10,30].

As far as we know, none of the above work has been applied to a formally verified OS kernel.

**Verification of Real-Time OS Kernels.** There is a lot of work about formal verification of OS kernels, see [18] for a survey. Therefore, we restrict our attention to verification of real-time kernels using proof assistants. We also do

<sup>5</sup> In terms of executions and arrival model.

not consider WCET computation, be it of the kernel itself (*e.g.,* [6,24]) or of the task set we consider. This is a complementary but clearly distinct task to get verified time bounds.

The eChronos OS [1,2] is a real-time OS running on single-core embedded systems. It stops its verification at the scheduling policy level, proving that the currently running task always has the highest priority among ready tasks. Xu et al. [31] verify the functional correctness of μC/OS-II [19], a real-time operating system with optimizations such as bitmaps. They also prove some high level properties, such as priority inversion freedom of shared memory IPC.

RT-CertiKOS [21] is a verified single-core real-time OS kernel developed by the Yale FLINT group, based on sequential CertiKOS [9,13]. It proves both temporal and spatial isolation among different components, where temporal isolation entails schedulability, etc. However, as explained in Sect. 5.1, its schedulability proof is longer whereas connecting to an existing schedulability analyzer is easier and more flexible.

#### **7 Conclusion**

Formal verification aims at providing stronger guarantees than testing. Realtime systems are a good target because they are often part of critical systems. Both the scheduling and OS communities have developed their own formally verified tools but there is a lack of integration between them. In this paper, we make a first step toward bridging this gap by integrating a formally proven schedulability analysis tool, Prosa, with a verified sequential real-time OS kernel, RT-CertiKOS. This gives two benefits: first, it provides RT-CertiKOS with a modular, extensible, state-of-the-art formal schedulability analysis proof; second, it gives a concrete instance of one of the scheduling theories described in Prosa, thus ensuring that its model is consistent and applicable to actual systems. We believe this connection can be easily adapted for other verified kernels or schedulability analyzers.

It also showcases that it is possible and practical to connect two completely independent medium- to large-scale formal proof developments.

**Acknowledgments.** This research has been partially supported by the following grants: PEPS INS2I JCJC 2019 Vefose, NSF grants 1521523, 1715154, and 1763399, DARPA grant FA8750-15-C-0082, as well as by the RT-PROOFS project (grant ANR-17-CE25-0016) and the CASERM project through the LabEx PERSYVAL-Lab (grant ANR-11-LABX-0025-01). The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Rely-Guarantee Reasoning About Concurrent Memory Management in Zephyr RTOS**

Yongwang Zhao1,2(B) and David San´an<sup>3</sup>

<sup>1</sup> School of Computer Science and Engineering, Beihang University, Beijing, China zhaoyw@buaa.edu.cn

<sup>2</sup> Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China

<sup>3</sup> School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

**Abstract.** Formal verification of concurrent operating systems (OSs) is challenging, and in particular the verification of the dynamic memory management due to its complex data structures and allocation algorithm. Up to our knowledge, this paper presents the first formal specification and mechanized proof of a concurrent buddy memory allocation for a real-world OS. We develop a fine-grained formal specification of the buddy memory management in Zephyr RTOS. To ease validation of the specification and the source code, the provided specification closely follows the C code. Then, we use the rely-guarantee technique to conduct the compositional verification of functional correctness and invariant preservation. During the formal verification, we found three bugs in the C code of Zephyr.

#### **1 Introduction**

The operating system (OS) is a fundamental component of critical systems. Thus, correctness and reliability of systems highly depend on the system's underlying OS. As a key functionality of OSs, the memory management provides ways to dynamically allocate portions of memory to programs at their request, and to free them for reuse when no longer needed. Since program variables and data are stored in the allocated memory, an incorrect specification and implementation of the memory management may lead to system crashes or exploitable attacks on the whole system. RTOS are frequently deployed on critical systems, making formal verification of RTOS necessary to ensure their reliability. One of the state of the art RTOS is Zephyr RTOS [1], a Linux Foundation project. Zephyr is an open source RTOS for connected, resource-constrained devices, and built

This work has been supported in part by the National Natural Science Foundation of China (NSFC) under the Grant No.61872016, and the National Satellite of Excellence in Trustworthy Software Systems and the Award No. NRF2014NCR-NCR001-30, funded by NRF Singapore under National Cyber-security R&D (NCR) programme.

with security and safety design in mind. Zephyr uses a buddy memory allocation algorithm optimized for RTOS, and that allows multiple threads to concurrently manipulate shared memory pools with fine-grained locking.

Formal verification of the concurrent memory management in Zephyr is a challenging work. (1) To achieve high performance, data structures and algorithms in Zephyr are laid out in a complex manner. The buddy memory allocation can split large blocks into smaller ones, allowing blocks of different sizes to be allocated and released efficiently while limiting memory fragmentation concerns. Seeking performance, Zephyr uses a multi-level structure where each level has a bitmap and a linked list of free memory blocks. The levels of bitmaps actually form a forest of quad trees of bits. Memory addresses are used as a reference to memory blocks, so the algorithm has to deal with address alignment and computation concerning the block size at each level, increasing the complexity of its verification. (2) A complex algorithm and data structures imply as well complex invariants that the formal model must preserve. These invariants have to guarantee the well-shaped bitmaps and their consistency to free lists. To prevent memory leaks and block overlapping, a precise reasoning shall keep track of both numerical and shape properties. (3) Thread preemption and fine-grained locking make the kernel execution of memory services to be concurrent.

In this paper, we apply the rely-guarantee reasoning technique to the concurrent buddy memory management in Zephyr. This work uses π-Core, a relyguarantee framework for the specification and verification of concurrent reactive systems. π-Core introduces a concurrent imperative system specification language driven by "events" that supports reactive semantics of interrupt handlers (e.g. kernel services, scheduler) in OSs, and thus makes the formal specification of Zephyr simpler. The language embeds Isabelle/HOL data types and functions, therefore it is as rich as the own Isabelle/HOL. π-Core concurrent constructs allow the specification of Zephyr multi-thread interleaving, fine-grained locking, and thread preemption. Compositionality of rely-guarantee makes feasible to prove the functional correctness of Zephyr and invariants over its data structures. The formal specification and proofs are developed in Isabelle/HOL. They are available at https://lvpgroup.github.io/picore/.

We first analyze the structural properties of memory pools in Zephyr (Sect. 3). The properties clarify the constraints and consistency of quad trees, free block lists, memory pool configuration, and waiting threads. All of them are defined as invariants for which its preservation under the execution of services is formally verified. From the well-shaped properties of quad trees, we can derive a critical property to prevent memory leaks, i.e., memory blocks cover the whole memory address of the pool, but not overlap each other.

Together with the formal verification of Zephyr, we aim at the highest evaluation assurance level (EAL 7) of Common Criteria (CC) [2], which was declared this year as the candidate standard for security certification by the Zephyr project. Therefore, we develop a fine-grained low level formal specification of a buddy memory management (Sect. 4). The specification has a line-to-line correspondence with the Zephyr C code, and thus is able to do the *code-to-spec* review required by the EAL 7 evaluation, covering all the data structures and imperative statements present in the implementation.

We enforce the formal verification of functional correctness and invariant preservation by using a rely-guarantee proof system (Sect. 5), which supports total correctness for loops where fairness does not need to be considered. The formal verification revealed three bugs in the C code: an incorrect block split, an incorrect return from the kernel services, and non-termination of a loop (Sect. 6). Two of them are critical and have been repaired in the latest release of Zephyr. The third bug causes nontermination of the allocation service when trying to allocate a block of a larger size than the maximum allowed.

*Related Work.* (1) Memory models [17] provide the necessary abstraction to separate the behaviour of a program from the behaviour of the memory it reads and writes. There are many formalizations of memory models in the literature, e.g., [10,14,15,19,21], where some of them only create an abstract specification of the services for memory allocation and release [10,15,21]. (2) Formal verification of OS memory management has been studied in CertiKOS [11,20], seL4 [12,13], Verisoft [3], and in the hypervisors from [4,5], where only the works in [4,11] consider concurrency. Comparing to buddy memory allocation, the data structures and algorithms verified in [11] are relatively simpler, without block split/coalescence and multiple levels of free lists and bitmaps. [4] only considers virtual mapping but not allocation or deallocation of memory areas. (3) Algorithms and implementations of dynamic memory allocation have been formally specified and verified in an extensive number of works [7–9,16,18,23]. However, the buddy memory allocation is only studied in [9], which does not consider concrete data structures (e.g. bitmaps) and concurrency. To the best of our knowledge, this paper presents the first formal specification and mechanized proof for a concurrent buddy memory allocation of a realistic operating system.

#### **2 Concurrent Memory Management in Zephyr RTOS**

In Zephyr, a memory pool is a kernel object that allows memory blocks to be dynamically allocated, from a designated memory region, and released back into the pool. Its definition in the C code is shown as follows. A memory pool's buffer (∗buf) is an n max-size array of blocks of max sz bytes at level 0, with no wasted space between them. The size of the buffer is thus n max × max sz bytes long. Zephyr tries to accomplish a memory request by splitting available blocks into smaller ones fitting as best as possible the requested size. Each "level 0" block is a quad-block that can be split into four smaller "level 1" blocks of equal size. Likewise, each level 1 block is itself a quad-block that can be split again. At each level, the four smaller blocks become *buddies* or *partners* to each other. The block size at level l is thus max sz/4<sup>l</sup> .

```
struct k_mem_block_id {
  u32_t pool : 8;
  u32_t level : 4;
  u32_t block : 20;
};
struct k_mem_pool_lvl {
  union {
    u32_t * bits_p ;
    u32_t bits ;
  };
  sys_dlist_t free_list ;
};
                                            struct k_mem_block {
                                              void * data ;
                                              struct k_mem_block_id id;
                                            };
                                            struct k_mem_pool {
                                              void * buf;
                                              size_t max_sz ;
                                              u16_t n_max ;
                                              u8_t n_levels ;
                                              u8_t max_inline_level ;
                                              struct k_mem_pool_lvl * levels ;
                                              _wait_q_t wait_q ;
                                            };
```
The pool is initially configured with the parameters n max and max sz, together with a third parameter min sz. min sz defines the minimum size for an allocated block and must be at least 4× X (X > 0) bytes long. Memory pool blocks are recursively split into quarters until blocks of the minimum size are obtained, at which point no further split can occur. The depth at which min sz blocks are allocated is <sup>n</sup> levels and satisfies that <sup>n</sup> max <sup>=</sup> min sz <sup>×</sup> <sup>4</sup><sup>n</sup> levels.

Every memory block is composed of a level; a block index within the level, ranging from 0 to (<sup>n</sup> max <sup>×</sup> <sup>4</sup>level) <sup>−</sup> 1; and the data representing the block start address, which is equal to buf + (max sz/4level) <sup>×</sup> block. We use a tuple (level, block) to uniquely represent a block within a pool p.

A memory pool keeps track of how its buffer space has been split using a linked list *free list* with the start address of the free blocks in each level. To improve the performance of coalescing partner blocks, memory pools maintain a bitmap at each level to indicate the allocation status of each block in the level. This structure is represented by a C union of an integer *bits* and an array *bits p*. The implementation can allocate the bitmaps at levels smaller than max inlinle levels using only an integer *bits*. However, the number of blocks in levels higher than max inlinle levels make necessary to allocate the bitmap information using the array *bits map*. In such a design, the levels of bitmaps actually form a forest of complete quad trees. The bit i in the bitmap of level j is set to 1 for the block (i, j) iff it is a free block, i.e. it is in the free list at level i. Otherwise the bitmap for such block is set to 0.

Zephyr provides two kernel services *k mem pool alloc* and *k mem pool free*, for memory allocation and release respectively. The main part of the C code of *k mem pool alloc* is shown in Fig. 1. When an application requests for a memory block, Zephyr first computes alloc l and free l. alloc l is the level with the size of the smallest block that will satisfy the request, and free l, with free l alloc l, is the lowest level where there are free memory blocks. Since the services are concurrent, when the service tries to allocate a free block *blk* from level free l (Line 8), blocks at that level may be allocated or merged into a bigger block by other concurrent threads. In such case the service will back out (Line 9) and tell the main function *k mem pool alloc* to retry. If blk is successfully locked for allocation, then it is broken down to level alloc l (Lines 11–14). The allocation service *k mem pool alloc* supports a *timeout* parameter to allow threads waiting for that pool for a period of time when the call does not succeed. If the allocation

**Fig. 1.** The C source code of memory allocation in Zephyr v1.8.0

fails (Line 24) and the timeout is not *K NO WAIT*, the thread is suspended (Line 30) in a linked list *wait q* and the context is switched to another thread (Line 31).

Interruptions are always enabled in both services with the exception of the code for the functions *alloc block* and *break block*, which invoke *irq lock* and *irq unlock* to respectively enable and disable interruptions. Similar to *k mem pool alloc*, the execution of *k mem pool free* is interruptable too.

#### **3 Defining Structures and Properties of Buddy Memory Pools**

As a specification at design level, we use abstract data types to represent the complete structure of memory pools. We use an abstract reference *ref* in Isabelle to define pointers to memory pools. Starting addresses of memory blocks, memory pools, and unsigned integers in the implementation are defined as *natural* numbers (*nat*). Linked lists used in the implementation for the elements *levels* and *free list*, together with the bitmaps used in *bits* and *bits p*, are defined as a *list* type. C *structs* are modelled in Isabelle as *records* of the same name as the implementation and comprising the same data. There are two exceptions to this: (1) k mem block id and k mem block are merged in one single record, (2) the union in the struct k mem pool lvl is replaced by a single list representing the bitmap, and thus *max inline level* is removed.

**Fig. 2.** Structure of memory pools

The Zephyr implementation makes use of a bitmap to represent the state of a memory block. The bit j of the bitmap for level a i is set to 1 iff the memory address of the memory block (i, j) is in the free list at level i. A bit j at a level i is set to 0 under the following conditions: (1) its corresponding memory block is allocated (*ALLOCATED*), (2) the memory block has been split (*DIVIDED*), (3) the memory block is being split in the allocation service (*ALLOCATING*) (Line 13 in Fig. 1), (4) the memory block is being coalesced in the release service (*FREEING*), and (5) the memory block does not exist (*NOEXIST*). Instead of only using a binary representation, our formal specification models the bitmap using a datatype *BlockState* that is composed of these cases together with *FREE*. The reason of this decision is to simplify proving that the bitmap shape is wellformed. In particular, this representation makes less complex to verify the case in which the descendant of a free block is a non-free block. This is the case where the last free block has not been split and therefore lower levels do not exist. We illustrate a structure of a memory pool in Fig. 2. The top of the figure shows the real memory of the first block at level 0.

The structural properties clarify the constraints on and consistency of quad trees, free block lists, the memory pool configuration, and waiting threads. All of them are thought of as invariants on the kernel state and have been formally verified on the formal specification in Isabelle/HOL.

*Well-Shaped Bitmaps.* We say that the logical memory block j at a level i physically exists iff the bitmap j for the level i is *ALLOCATED*, *FREE*, *ALLO-CATING*, or *FREEING*, represented by the predicate is memblock. We do not consider blocks marked as *DIVIDED* as physical blocks since it is only a logical block containing other blocks. Threads may split and coalesce memory blocks. A valid forest is defined by the following rules: (1) the parent bit of an existing memory block is *DIVIDED* and its child bits are *NOEXIST*, denoted by the predicate noexist bits that checks for a given bitmap b and a position j that nodes b!j to b!(j + 3) are set as *NOEXIST*; (2) the parent bit of a *DIVIDED* block is also *DIVIDED*; and (3) the child bits of a *NOEXIST* bit are also *NOEX-IST* and its parent can not be a *DIVIDED* block. The property is defined as the predicate **inv-bitmap**(s), where s is the state.

There are two additional properties on bitmaps. First, the address space of any memory pool cannot be empty, i.e., the bits at level 0 have to be different to *NOEXIST*. Second, the allocation algorithm may split a memory block into smaller ones, but not the those blocks at the lowest level (i.e. level n levels−1), therefore the bits at the lowest level cannot not be *DIVIDED*. The first property is defined as **inv-bitmap0**(s) and the second as **inv-bitmapn**(s).

*Consistency of the Memory Configuration.* The configuration of a memory pool is set when it is initialized. Since the minimum block size is aligned to 4 bytes, there must exists an n > 0 such that the maximum size of a pool is equal to 4 <sup>×</sup> <sup>n</sup> <sup>×</sup> <sup>4</sup><sup>n</sup> levels, relating the number of levels of a level 0 block with its maximum size. Moreover, the number of blocks at level 0 and the number of levels have to be greater than zero, since the memory pool cannot be empty. The number of levels is equal to the length of the pool levels list. Finally, the length of the bitmap at level <sup>i</sup> should be <sup>n</sup> max <sup>×</sup> <sup>4</sup><sup>i</sup> . This property is defined as **inv-mempool-info**(s).

*Memory Partition Property.* Memory blocks partition the pool they belong to, and then not overlapping blocks and the absence of memory leaks are critical properties. For a memory block of index j at level i, its address space is the interval [j×(max sz/4<sup>i</sup> ),(j+1)×(max sz/4<sup>i</sup> )). For any relative memory address addr in the memory domain of a memory pool, and hence addr < n max ∗ max sz, there is one and only one memory block whose address space contains addr. Here, we use relative address for addr. The property is defined as **mem-part**(s).

From the invariants of the bitmap, we derive the general property for the memory partition.

**Theorem 1 (Memory Partition).** *For any kernel state* s*, If the memory pools in* s *are consistent in their configuration, and their bitmaps are well-shaped, the memory pools satisfy the partition property in* s*:*

*inv mempool info*(s) <sup>∧</sup> *inv bitmap*(s) <sup>∧</sup> *inv bitmap0*(s) <sup>∧</sup> *inv bitmapn*(s) =<sup>⇒</sup> *mem part*(s)

Together with the memory partition property, pools must also satisfy the following:

*No Partner Fragmentation.* The memory release algorithm in Zephyr coalesces free partner memory blocks into blocks as large as possible for all the descendants from the root level, without including it. Thus, a memory pool does not contain four *FREE* partner bits.

*Validity of Free Block Lists.* The free list at one level keeps the starting address of free memory blocks. The memory management ensures that the addresses in the list are valid, i.e., they are different from each other and aligned to the *block size*, which at a level i is given by (max sz/4<sup>i</sup> ). Moreover, a memory block is in the free list iff the corresponding bit of the bitmap is *FREE*.

*Non-overlapping of Memory Pools.* The memory spaces of the set of pools defined in a system must be disjoint, so the memory addresses of a pool does not belong to the memory space of any other pool.

*Other Properties.* The state of a suspended thread in *wait q* has to be consistent with the threads waiting for a memory pool. Threads can only be blocked once, and those threads waiting for available memory blocks have to be in a *BLOCKED* state. During allocation and free of a memory block, blocks of the tree may temporally be manipulated during the coalesce and division process. A block can be only manipulated by a thread at a time, and the state bit of a block being temporally manipulate has to be *FREEING* or *ALLOCATING*.

#### **4 Formalizing Zephyr Memory Management**

For the purpose of formal verification of event-driven systems such as OSs, we have developed π-Core, a framework for rely-guarantee reasoning of components running in parallel invoking events. π-Core has support for concurrent OSs features like modelling shared-variable concurrency of multiple threads, interruptable execution of handlers, self-suspending threads, and rescheduling. In this section, we first introduce the modelling language in π-Core and an execution model of Zephyr using this language. Then we discuss in detail the low-level design specification for the kernel services that the memory management provides. Since this work focuses on the memory management, we only provide very abstract models for other kernel functionalities such as the kernel scheduling and thread control.

#### **4.1 Event-Based Execution Model of Zephyr**

*The Language in* π*-Core*. Interrupt handlers in π-Core are considered as reaction services which are represented as *events*:

## **EVENT** E [p1, ..., pn]@κ **WHEN** g **THEN** P **END**

In this representation, an event is a parametrized imperative program P with a name E, a list of service input parameters p1, ..., pn, and a guard condition g to determine the conditions triggering the event. In addition to the input parameters, an event has a special parameter κ which indicates the execution context, e.g. the scheduler and the thread invoking the event. The imperative commands of an event body P in π-Core are standard sequential constructs such as conditional execution, loop, and sequential composition of programs. It also includes a synchronization construct for concurrent processes represented by **AWAIT** b **THEN** P **END**. The body P is executed atomically if and only if the boolean condition b holds, not progressing otherwise. **ATOM** P **END** denotes an *Await* statement for which its guard is T rue.

Threads and kernel processes have their own execution context and local states. Each of them is modelled in π-Core as a set of events called *event systems* and denoted as **ESYS** S ≡ {E0, ..., E<sup>n</sup>}. The operational semantics of an event system is the *sequential composition* of the execution of the events composing it. It consists in the continuous evaluation of the guards of the system events. From the set of events for which the associated guard g holds in the current state, one event E is non-deterministically selected to be triggered, and its body P executed. After P finishes, the evaluation of the guards starts again looking for the next event to be executed. Finally, π-Core has a construct for parallel composition of event systems esys<sup>0</sup> ... esys<sup>n</sup> which interleaves the execution of the events composing each event system esys<sup>i</sup> for 0 ≤ i ≤ n.

**Fig. 3.** An execution model of Zephyr memory management

*Execution Model of Zephyr*. If we do not consider its initialization, an OS kernel can be consider as a reactive system that is in an *idle* loop until it receives an interruption which is handled by an interruption handler. Whilst interrupt handlers execution is atomic in sequential kernels, it can be interrupted in concurrent kernels [6,22] allowing services invoked by threads to be interrupted and resumed later. In the execution model of Zephyr, we consider a scheduler S and a set of threads t1, ..., tn. In this model, the execution of the scheduler is atomic since kernel services can not interrupt it. But kernel services can be interrupted via the scheduler, i.e., the execution of a memory service invoked by a thread t<sup>i</sup> may be interrupted by the kernel scheduler to execute a thread t<sup>j</sup> . Figure 3 illustrates Zephyr execution model, where solid lines represent execution steps of the threads/kernel services and dotted lines mean the suspension of the thread/code. For instance, the execution of *k mempool free* in thread t<sup>1</sup> is interrupted by the scheduler, and the context is switched to thread t<sup>2</sup> which invokes *k mempool alloc*. During the execution of t2, the kernel service may suspend the thread and switch to another thread t<sup>n</sup> by calling *rescheduling*. Later, the execution is switched back to t<sup>1</sup> and continues the execution of *k mempool free* in a different state from when it was interrupted.

The event systems of Zephyr are illustrated in the right part of Fig. 3. A user thread t<sup>i</sup> invoke allocation/release services, thus the event system for t<sup>i</sup> is esys<sup>t</sup>*<sup>i</sup>* , a set composed of the events *alloc* and *free*. The input parameters for these events correspond with the arguments of the service implementation, that are constrained by the guard for each service. Together with system users we model the event service for the scheduler esyssched consisting on a unique event *sched* whose argument is a thread t to be scheduled when t is in the *READY* state. The formal specification of the memory management is the parallel composition of the event system for the threads and the scheduler esys<sup>t</sup><sup>1</sup> ... esys<sup>t</sup>*<sup>n</sup>* esyssched

*Thread Context and Preemption*. Events are parametrized by a thread identifier used to access to the execution context of the thread invoking it. As shown in Fig. 3, the execution of an event executed by a thread can be stopped by the scheduler to be resumed later. This behaviour is modelled using a global variable cur that indicates the thread being currently has been scheduled and is being executed, and conditioning the execution of parametrized events in t only when t is scheduled. This is achieved by using the expression t p ≡ **AWAIT** cur = t **THEN** p **END**, so an event invoked by a thread t only progresses when t is scheduled. This scheme allows to use rely-guarantee for concurrent execution of threads on mono-core architectures, where only the scheduled thread is able to modify the memory.

#### **4.2 Formal Specification of Memory Management Services**

This section discusses the formal specification of the memory management services. These services deal with the initialization of pools, and memory allocation and release.

*System State*. The system state includes the memory model introduced in Sect. 4, together with the thread under execution in variable cur and local variables to the memory services used to keep temporal changes to the structure, guards in conditional and loop statements, and index accesses. The memory model is represented as a set *mem pools* storing the references of all memory pools and a mapping *mem pool info* to query a pool by a pool reference. Local variables are modelled as total functions from threads to variable values, representing that the event is accessing the thread context. In the formal model of the events we represent access to a state component c using *´*c and the value of a local component c for the thread t is represented as *´*c t. Local variables *allocating node* and *freeing node* are relevant for the memory services, storing the temporal blocks being split/coalesced in alloc/release services respectively.

*Memory Pool Initialization*. Zephyr defines and initializes memory pools at compile time by constructing a static variable of type *struct k mem pool*. The implementation initializes each pool with *n max* level 0 blocks with size *max sz* bytes. Bitmaps of level 0 are set to 1 and free list contains all level 0 blocks. Bitmaps and free lists of other level are initialized to 0 and to the empty list respectively. In the formal model, we specify a state corresponding to the implementation initial state and we show that it belongs to the set of states satisfying the invariant.

*Memory Allocation/Release Services*. The C code of Zephyr uses the recursive function *free block* to coalesce free partner blocks and the *break* statement to stop the execution of a loop statements, which are not supported by the imperative language in π-Core. The formal specification overcomes this by transforming the recursion into a loop controlled by the recursion condition, and using a control variable to exit loops with breaks when the condition to execute the loop break is satisfied. Additionally, the memory management services use the atomic body *irq lock(); P; irq unlock();* to keep interruption handlers *reentrant* by disabling interruptions. We simplify this behaviour in the specification using an **ATOM** statement, avoiding that the service is interrupted at that point. The rest of the formal specification closely follows the implementation, where variables are modified using higher order functions changing the state as the code does it. The reason of using Isabelle/HOL functions is that π-Core does not provide a semantic for expressions, using instead state transformer relying on high order functions to change the state.

Figure 4 illustrates the π-Core specification of the *free block* function invoked by *k mem pool free* when releasing a memory block. The code accesses the following variables: lsz, lsize, and lvl to keep information about the current level; blk, bn, and bb to represent the address and number of the block currently being accessed; freeing node to represent the node being freeing; and i to iterate blocks. Additionally, the model includes the component free block r to model the recursion condition. To simplify the representation the model uses predicates and functions to access and modify the state. Due to space constrains, we are unable to provide detailed explanation of these functions. However the name of the functions can help the reader to better understand their functionality. We refer readers to the Isabelle/HOL sources for the complete specification of the formal model.

In the C code, *free block* is a recursive function with two conditions: (1) the block being released belongs to a level higher than zero, since blocks at level zero cannot be merged; and (2) the partners bits of the block being released are FREE so they can be merged into a bigger block. We represent (1) with the predicate *´*lvl t > 0 and (2) with the predicate partner bit free. The formal specification follows the same structure translating the recursive function into a loop that is controlled by a variable mimicking the recursion.

The formal specification for *free block* first releases an allocated memory block bn setting it to *FREEING*. Then, the loop statement sets *free block* to *FREE* (Line 5), and also checks that the iteration/recursive condition holds in Line 7. If the condition holds, the partner bits are set to *NOEXIST*, and remove their addresses from the free list for this level (Lines 12–14). Then, it sets the parent block bit to *FREEING* (Lines 17–22), and updates the variables controlling the current block and level numbers, before going back to the beginning of the loop again. If the iteration condition is not true it sets the bit to *FREE* and add the block to the free list (Lines 24–28) and sets the loop condition to false to end the procedure. This function is illustrated in Fig. 2. The block 172 is released by a thread and since its partner blocks (block 173–175) are free, Zephyr coalesces the four blocks and sets their parent block 43 as *FREEING*. The coalescence continues iteratively if the partners of block 43 are all free.

#### **5 Correctness and Rely-Guarantee Proof**

We have proven correctness of the buddy memory management in Zephyr using the rely-guarantee proof system of π-Core. We ensure functional correctness of each kernel service w.r.t. the defined pre/post conditions, invariant preservation, termination of loop statements in the kernel services, the preservation of the memory configuration during small steps of kernel services, and the separation of local variables of threads. In this section, we introduce the rely-guarantee proof system of π-Core and how these properties are specified and verified using it.

#### **5.1 Rely-Guarantee Proof Rules and Verification**

A rely-guarantee specification for a system is a quadruple RGCond = pre, R, G, pst, where pre is the pre-condition, R is the rely condition, G is the guarantee condition, and pst is the post-condition. The intuitive meaning of a valid rely-guarantee specification for a parallel component P, denoted by |= P **sat** pre, R, G, pst, is that if P is executed from an initial state s ∈ pre and any environment transition belongs to the rely relation R, then the state transitions carried out by P belong to the guarantee relation G and the final states belong to pst.

We have defined a rely-guarantee axiomatic proof system for the π-Core specification language to prove validity of rely-guarantee specifications, and proven in Isabelle/HOL its soundness with regards to the definition of validity. Some of the rules composing the axiomatic reasoning system are shown in Fig. 5.


**Fig. 5.** Typical rely-guarantee proof rules in π-Core

A predicate P is stable w.r.t. a relation R, represented as stable(P, R), when for any pair of states (s, t) such that s ∈ P and (s, t) ∈ R then t ∈ P. The intuitive meaning is that an environment represented by R does not affect the satisfiability of P. The parallel rule in Fig. 5 establishes compositionality of the proof system, where verification of the parallel specification can be reduced to the verification of individual event systems first and then to the verification of individual events. It is necessary that each event system PS(κ) satisfies its specification presκ, Rsκ, Gsκ, pstsκ (Premise 1); the pre-condition for the parallel composition implies all the event system's pre-conditions (Premise 2); the overall post-condition must be a logical consequence of all post-conditions of event systems (Premise 3); since an action transition of the concurrent system is performed by one of its event system, the guarantee condition Gs<sup>κ</sup> of each event system must be a subset of the overall guarantee condition G (Premise 4); an environment transition Rs<sup>κ</sup> for the event system κ corresponds to a transition from the overall environment R (Premise 5); and an action transition of an event system κ should be defined in the rely condition of another event system κ , where κ = κ (Premise 6).

To prove loop termination, loop invariants are parametrized with a logical variable α. It suffices to show total correctness of a loop statement by the following proposition where loopinv(α) is the parametrize invariant, in which the logical variable is used to find a convergent relation to show that the number of iterations of the loop is finite.

 <sup>P</sup> **sat** loopinv(α) ∩ {| α > <sup>0</sup> |}, R, G, <sup>∃</sup>β < α. loopinv(β) ∧ loopinv(α) ∩ {| α > <sup>0</sup> |}⊆{| <sup>b</sup> |} <sup>∧</sup> loopinv(0) ⊆ {| ¬<sup>b</sup> |}∧∀<sup>s</sup> <sup>∈</sup> loopinv(α). (s, t) <sup>∈</sup> <sup>R</sup> −→ ∃<sup>β</sup> α. t <sup>∈</sup> loopinv(β)

#### **5.2 Correctness Specification**

Using the compositional reasoning of π-Core, correctness of Zephyr memory management can be specified and verified with the rely-guarantee specification of each event. The functional correctness of a kernel service is specified by its pre/post-conditions. Invariant preservation, memory configuration, and separation of local variables is specified in the guarantee condition of each service.

The guarantee condition for both memory services is defined as:

$$\begin{aligned} & \mathbf{Mem-pool-alloc-guard} \mathrel{\mathbf{s} \equiv \overset{(1)}{Id} \cup \overbrace{(gvars.conf.stable \cap } \cap }^{(2)} \\ & \{ (s.r.) . \overbrace{(\text{car } s \neq Sone \, t \longrightarrow gvars - nochange \, s \, r \wedge lvars-nochange \, t \, s \, r \,}^{(3.1)}) \\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \$$

This relation states that *alloc* and *free* services may not change the state (1), e.g., a blocked await or selecting branch on a conditional statement. If it changes the state then: (2) the static configuration of memory pools in the model do not change; (3.1) if the scheduled thread is not the thread invoking the event then variables for that thread do not change (since it is blocked in an *Await* as explained in Sect. 3); (3.2) if it is, then the relation preserves the memory invariant, and consequently each step of the event needs to preserve the invariant; (4) a thread does not change the local variables of other threads.

Using the π-Core proof rules we verify that the invariant introduced in Sect. 4 is preserved by all the events. Additionally, we prove that when starting in a valid memory configuration given by the invariant, then if the service does not returns an error code then it returns a valid memory block with size bigger or equal than the requested capacity. The property is specified by the following postcondition:

**Mem-pool-alloc-pre***t*≡ {*s*.*inv s*∧*allocating-node s t*=*None*∧*freeing-node s t*=*None*} **Mem-pool-alloc-post** *t p sz timeout* ≡

{*s*. *inv s* ∧ *allocating-node s t* = *None* ∧ *freeing-node s t* = *None*

∧ (*timeout* = *FOREVER* −→

(*ret s t* = *ESIZEERR* ∧ *mempoolalloc-ret s t* = *None* ∨

*ret s t* = *OK* ∧ (∃ *mblk*. *mempoolalloc-ret s t* = *Some mblk* ∧ *mblk-valid s p sz mblk*)))

∧ (*timeout* = *NOWAIT* −→

((*ret s t* = *ENOMEM* ∨ *ret s t* = *ESIZEERR*) ∧ *mempoolalloc-ret s t* = *None*) ∨

(*ret s t* = *OK* ∧ (∃ *mblk*. *mempoolalloc-ret s t* = *Some mblk* ∧ *mblk-valid s p sz mblk*)))

∧ (*timeout* > *0* −→

((*ret s t* = *ETIMEOUT* ∨ *ret s t* = *ESIZEERR*) ∧ *mempoolalloc-ret s t* = *None*) ∨ (*ret s t* = *OK* ∧ (∃ *mblk*. *mempoolalloc-ret s t* = *Some mblk* ∧ *mblk-valid s p sz mblk*)))}

If a thread requests a memory block in mode *FOREVER*, it may successfully allocate a valid memory block, or fail (*ESIZEERR*) if the request size is larger than the size of the memory pool. If the thread is requesting a memory pool in mode *NOWAIT*, it may also get the result of *ENOMEM* if there is no available blocks. But if the thread is requesting in mode *TIMEOUT*, it will get the result of *ETIMEOUT* if there is no available blocks in *timeout* milliseconds.

The property is indeed weak since even if the memory has a block able to allocate the requested size before invoking the allocation service, another thread running concurrently may have taken the block first during the execution of the service. For the same reason, the released block may be taken by another concurrent thread before the end of the release services.

#### **5.3 Correctness Proof**

In the π-Core system, verification of a rely-guarantee specification proving a property is carried out by inductively applying the proof rules for each system event and discharging the proof obligations the rules generate. Typically, these proof obligations require to prove stability of the pre- and post-condition to check that changes of the environment preserve them, and to show that a statement modifying a state from the precondition gets a state belonging to the postcondition.

To prove termination of the loop statement in *free block* shown in Fig. 4, we define the loop invariant with the logical variable α as follows.

**mp-free-loopinv** *t b* α ≡ {| *...* ∧´*inv* ∧ *level b* < *length* (´*lsizes t*)

```
∧ (∀ ii<length (´lsizes t). ´lsizes t ! ii = (max-sz (´mem-pool-info (pool b))) div (4 ˆ ii))
```
∧ ´*bn t* < *length* (*bits* (*levels* (´*mem-pool-info* (*pool b*))!(´*lvl t*)))

∧ ´*bn t* = (*block b*) *div* (*4 ˆ* (*level b* − ´*lvl t*)) ∧ ´*lvl t* ≤ *level b*

∧ (´*free-block-r t* −→ (∃ *blk*. ´*freeing-node t* = *Some blk* ∧ *pool blk* = *pool b*

∧ *level blk* = ´*lvl t* ∧ *block blk* = ´*bn t*)

∧ ´*alloc-memblk-data-valid* (*pool b*) (*the* (´*freeing-node t*))) ∧ (¬ ´*free-block-r t* −→ ´*freeing-node t* = *None*) |} ∩ {| α = (*if* ´*freeing-node t* = *None then* ´*lvl t* + *1 else 0*) |}

freeing node and lvt are local variables respectively storing the node being free and the level that the node belongs to. In the body of the loop, if lvl t > 0 and partner bit is *true*, then lvl = lvl − 1 at the end of the body. Otherwise, freeing node t = None. So at the end of the loop body, α decreases or α = 0. If α = 0, we have freeing node t = None, and thus the negation of the loop condition ¬free block r t, concluding termination of *free block*.

Due to concurrency, it is necessary to consider fairness to prove termination of the loop statement in *k mempool alloc* from Line 23 to 33 in Fig. 1. On the one hand, when a thread requests a memory block in the *FOREVER* mode, it is possible that there will never be available blocks since other threads do not release allocated blocks. On the other hand, even when other threads release blocks, it is possible that the available blocks are always raced by threads.

#### **6 Evaluation and Results**

*Evaluation.* The verification conducted in this work is on Zhephyr v1.8.0, released in 2017. The C code of the buddy memory management is ≈400lines, not counting blank lines and comments. Table 1 shows the statistics for the effort and size of the proofs in the Isabelle/HOL theorem prover. In total, the models and mechanized verification consists of ≈28,000 lines of specification and proofs, and the total effort is ≈12 person-months. The specification and proof of π-Core are reusable for the verification of other systems.


**Table 1.** Specification and proof statistics

*Bugs in Zephyr.* During the formal verification, we found 3 bugs in the C code of Zephyr. The first two bugs are critical and have been repaired in the latest release of Zephyr. To avoid the third one, callers to *k mem pool alloc* have to constrain the argument *t size size*.

**(1) Incorrect block split:** this bug is located in the loop in Line 11 of the *k mem pool alloc* service, shown in Fig. 1. The *level empty* function checks if a pool p has blocks in the free list at level *alloc l*. Concurrent threads may release a memory block at that level making the call to *level empty(p, alloc l)* to return *false* and stopping the loop. In such case, it allocates a memory block of a bigger capacity at a level i but it still sets the level number of the block as *alloc l* at Line 15. The service allocates a larger block to the requesting thread causing an internal fragmentation of max sz/4<sup>i</sup> <sup>−</sup> max sz/4alloc <sup>l</sup> bytes. When this block is released, it will be inserted into the free list at level *alloc l*, but not at level <sup>i</sup>, causing an external fragmentation of max sz/4<sup>i</sup> <sup>−</sup> max sz/4alloc <sup>l</sup> . The bug is fixed by removing the condition *level empty(p, alloc l)* in our specification.

**(2) Incorrect return from** *k mem pool alloc***:** this bug is found at Line 26 in Fig. 1. When a suitable free block is allocated by another thread, the *pool alloc* function returns *EAGAIN* at Line 9 to ask the thread to retry the allocation. When a thread invokes *k mem pool alloc* in *FOREVER* mode and this case happens, the service returns *EAGAIN* immediately. However, a thread invoking *k mem pool alloc* in *FOREVER* mode should keep retrying when it does not succeed. We repair the bug by removing the condition ret == EAGAIN at Line 26. As explained in the comments of the C Code, *EAGAIN* should not be returned to threads invoking the service. Moreover, the *return EAGAIN* at Line 34 is actually the case of time out. Thus, we introduce a new return code *ETIMEOUT* in our specification.

**(3) Non-termination of** *k mem pool alloc***:** we have discussed that the loop statement at Lines 23–33 in Fig. 1 does not terminate. However, it should terminate in certain cases, which are actually violated in the C code. When a thread requests a memory block in *FOREVER* mode and the requested size is larger than *max sz*, the maximum size of blocks, the loop at Lines 23–33 in Fig. 1 never finishes since *pool alloc* always returns *ENOMEM*. The reason is that the "*return ENOMEM* " at Line 6 does not distinguish two cases, alloc l < 0 and free l < 0. In the first case, the requested size is larger than *max sz* and the kernel service should return immediately. In the second case, there are no free blocks larger than the requested size and the service tries forever until some free block available. We repair the bug by splitting the *if* statement at Lines 4–7 into these two cases and introducing a new return code *ESIZEERR* in our specification. Then, we change the condition at Lines 25–26 to check that the returned value is *ESIZEERR* instead of *ENOMEM*.

#### **7 Conclusion and Future Work**

In this paper, we have developed a formal specification at low-level design of the concurrent buddy memory management of Zephyr RTOS. Using the relyguarantee technique in the π-Core framework, we have formally verified a set of critical properties for OS kernels such as invariant preservation, and preservation of memory configuration. Finally, we identified some critical bugs in the C code of Zephyr.

Our work explores the challenges and cost of certifying concurrent OSs for the highest-level assurance. The definition of properties and rely-guarantee relations is complex and the verification task becomes expensive. We used 40 times of LOS/LOP than the C code at low-level design. Next, we are planning to verify other modules of Zephyr, which may be easier due to simpler data structures and algorithms. For the purpose of fully formal verification of OSs at source code level, we will replace the imperative language in π-Core by a more expressive one and add a verification condition generator (VCG) to reduce the cost of the verification.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Violat: Generating Tests of Observational Refinement for Concurrent Objects**

Michael Emmi1(B) and Constantin Enea<sup>2</sup>

<sup>1</sup> SRI International, New York, NY, USA michael.emmi@sri.com <sup>2</sup> Universit´e de Paris, IRIF, CNRS, 75013 Paris, France cenea@irif.fr

**Abstract.** High-performance multithreaded software often relies on optimized implementations of common abstract data types (ADTs) like counters, key-value stores, and queues, i.e., *concurrent objects*. By using fine-grained and non-blocking mechanisms for efficient inter-thread synchronization, these implementations are vulnerable to violations of ADTconsistency which are difficult to detect: bugs can depend on specific combinations of method invocations and argument values, as well as rarely-occurring thread interleavings. Even given a bug-triggering interleaving, detection generally requires unintuitive test assertions to capture inconsistent combinations of invocation return values.

In this work we describe the Violat tool for generating tests that witness violations to atomicity, or weaker consistency properties. Violat generates self-contained and efficient programs that test *observational refinement*, i.e., substitutability of a given ADT with a given implementation. Our approach is both sound and complete in the limit: for every consistency violation there is a failed execution of some test program, and every failed test signals an actual consistency violation. In practice we compromise soundness for efficiency via random exploration of test programs, yielding probabilistic soundness instead. Violat's tests reliably expose ADT-consistency violations using off-the-shelf approaches to concurrent test validation, including stress testing and explicit-state model checking.

#### **1 Introduction**

Many mainstream software platforms including Java and .NET support multithreading to enable parallelism and reactivity. Programming multithreaded code effectively is notoriously hard, and prone to data races on shared memory accesses, or deadlocks on the synchronization used to protect accesses. Rather than confronting these difficulties, programmers generally prefer to leverage libraries providing *concurrent objects* [19,29], i.e., optimized thread-safe implementations of common abstract data types (ADTs) like counters, key-value stores, and queues. For instance, Java's concurrent collections include implementations which eschew the synchronization bottlenecks associated with lock-based mutual exclusion, opting instead for non-blocking mechanisms [28] provided by hardware operations like *atomic compare and exchange*.

Concurrent object implementations are themselves vulnerable to elusive bugs: even with effective techniques for exploring the space of thread interleavings, like stress testing or model checking [7,30,47], bugs often depend on specific combinations of method invocations and argument values. Furthermore, even recognizing whether a given execution is *correct* is non-trivial, since recognition generally requires unintuitive test assertions to identify inconsistent combinations of return values. Technically, correctness amounts to *observational refinement* [18,21,32], which captures the substitutability of an ADT with an implementation [23]: any combination of values admitted by a given implementation is also admitted by the given ADT specification.

In this work we describe an approach to generating tests of observational refinement for concurrent objects, as implemented by the Violat tool, which we use to discover violations to atomicity (and weaker consistency properties) in widely-used concurrent objects [9,10,12]. Unlike previous approaches based on *linearizability* [4,20,46], Violat generates self-contained test programs which do not require enumerating linearizations dynamically *per execution*, instead statically precomputing the ADT-admitted return-value outcomes *per test program*, once, prior to testing. Despite this optimization, the approach is both sound and complete, i.e., in the limit: for every consistency violation there is a failed execution of some test program, and every failed test witnesses an actual consistency violation. In practice, we compromise soundness for efficiency via random exploration of test programs, achieving probabilistic soundness instead.

Besides improving the efficiency of test execution, Violat's self-contained tests can be validated by both stress testers and model checkers, and double as regression and conformance tests. Our previous works [9,10,12] demonstrate that Violat's tests reliably expose ADT-consistency violations in Java implementations using the Java Concurrency Stress testing tool [42]. In particular, Violat has uncovered atomicity violations in over 50 methods from Java's concurrent collections; many of these violations seem to correspond with their documentations' mention of *weakly-consistent* behavior, while others indicate confirmed implementation bugs, which we have reported.

Previous work used Violat in empirical studies, without artifact evaluation [9,10,12]. This article is the first to consider Violat itself for evaluation, the first to describe its implementation and usage, and includes several novel extensions. For instance, in addition to stress testing, Violat now includes an integration with Java Pathfinder [47]; besides enabling complete systematic coverage of a given test program, this integration enables the output of the execution traces leading to consistency violations, thus facilitating diagnosis and repair. Furthermore, Violat is now capable of generating tests of any user-provided implementation, in addition to those distributed with Java.

#### **2 Overview of Test Generation with Violat**

Violat generates self-contained programs to test the observational refinement of a given concurrent object implementation with respect to its abstract data type (ADT), according to Fig. 1. While its methodology is fairly platform agnostic, Violat currently integrates with the Java platform. Accordingly, its input includes the fully-qualified name of a single Java class, which is assumed to be available either on the system classpath, or in a user-provided Java archive (JAR); its output is a sequence of Java classes which can be tested with offthe-shelf back-end analysis engines, including the Java Concurrency Stress testing tool [42] and Java Pathfinder [47]. Our current implementation integrates directly with both back-ends, and thus reports test results directly, signaling any discovered consistency violations.

**Fig. 1.** Violat generates tests by enumerating program schemas invoking a given concurrent object, annotating those schemas with the expected outcomes of invocations according to ADT specifications, and translating annotated schemas to executable tests.

Violat generates tests according to a three-step pipeline. The first step, described in Sect. 3, enumerates test program *schemas*, i.e., concise descriptions of programs as parallel sequences of invocations of the given concurrent object's methods. For example, Fig. 2 lists several test schemas for Java's ConcurrentHashMap. The second step, described in Sect. 4, annotates each schema with a set of expected *outcomes*, i.e., the combinations of return values among the given schema's invocations which are admitted according to the given object's ADT specification. The final step, described in Sect. 5, translates each schema into a self-contained<sup>1</sup> Java class.

Technically, to guide the enumeration of schemas and calculation of outcomes, Violat requires a specification of the given concurrent object, describing constructor and method signatures. While this could be generated automatically from the object's bytecode, our current implementation asks the user to input this specification in JSON format. By additionally indicating whether methods are read-only or weakly-consistent, the user can provide additional hints to

<sup>1</sup> The generated class imports only a given concurrent object, and a few basic java.util classes.

improve schema enumeration and outcome calculation. For instance, excessive generation of programs with only read-only methods is unlikely to uncover consistency violations, and weakly-consistent ADT methods generally allow additional outcomes – see Emmi and Enea [12]. Furthermore, Violat attempts to focus the blame for discovered violations by constructing tests with a small number of specified *untrusted* methods, e.g., just one.

#### **3 Test Enumeration**

To enumerate test programs effectively, Violat considers a simple representation of program *schemas*, as depicted in Fig. 2. We write schemas with a familiar notation, as parallel compositions *{*...*}*||*{*...*}* of method-invocation sequences. Intuitively, schemas capture parallel threads invoking sequences of methods of a given concurrent object. Besides the parallelism, these schemas include only trivial control and data flow. For instance, we exclude conditional statements and loops, as well as passing return values as arguments, in favor of straight-line code with literal argument values. Nevertheless, this simple notion is expressive enough to capture any possible *outcome*, i.e., combination of invocation return values, of programs with arbitrarily complex control flow, data flow, and synchronization. To see this, consider any outcome *y* admitted by some execution of a program with arbitrarily-complex control and data flow in which methods are invoked with argument values *x*, collectively. The schema in which each thread invokes the same methods of a thread of the original program with literal values *x*, collectively, is guaranteed to admit the same outcome *y*.


**Fig. 2.** Program schemas generated by Violat for Java's ConcurrentHashMap class, along with outcomes which are observed in testing, yet not predicated by Violat.

For a given concurrent object, Violat enumerates schemas according to a few configurable parameters, including bounds on the number of threads, invocations, and (primitive) values. By default, Violat generates schemas with exactly 2 threads, between 3 and 6 invocations, and exactly 2 values. While our initial implementation enumerated schemas systematically according to a welldefined order, empirically we found that this strategy spends too much time in neighborhoods of uninteresting schemas, i.e., which do not expose violations. Ultimately we adopted a pseudorandom enumeration which constructs each schema independently by randomly choosing the number of threads, invocations, and values, within the given parameter bounds, and randomly populating threads with invocations. Methods are selected according to a weighted random choice, in which the weights of read-only and untrusted methods is 1; trusted mutator methods have weight 3. The read-only and trusted designations are provided by class specifications – see Sect. 2. Integer argument values are chosen randomly between 0 and 1, according to the default value bound; generic-typed arguments are assumed to be integers. Collection and map values are constructed from randomly-chosen integer values, up to size 2. In principle, all of these bounds are configurable, but we have found these defaults to work reasonably well.

Note that while the manifestation of a given concurrency bug can, in principle, rely on large bounds on threads, invocations, and values, recent studies demonstrate that the majority (96%) can be reproduced with just 2 threads [25]. Furthermore, while our current implementation adheres to the simple notion of schema in which all threads are execute in parallel, Violat can easily be extended to handle a more complex notion of schema in which threads are partially ordered, thus capturing arbitrary program synchronization. Nevertheless, this simple notion seems effective at exposing violations without requiring additional synchronization – see Emmi and Enea [12, Section 5.2].

#### **4 Computing Expected Outcomes**

To capture violations to observational refinement, Violat computes the set of *expected outcomes*, i.e., those admitted by a given concurrent object's abstract data type (ADT), for each program schema. Violat essentially follows the approach of Line-Up [4] by computing expected outcomes from sequential executions of the given implementation. While this approach assumes that the sequential behavior of a given implementation does adhere to its implicit ADT specification – and that the outcomes of concurrent executions are also outcomes of sequential executions – there is typically no practical alternative, since behavioral ADT specifications are rarely provided.

Violat computes the expected outcomes of a given schema once, by enumerating all possible shuffles of threads' invocations, and recording the return values of each shuffle when executed by the given implementation. For instance, there are 10 ways to shuffle the threads of the schema

*{* get(1); containsValue(1) *}* || *{* put(1,1); put(0,1); put(1,0) *}*

from Fig. 2, including the sequence

get(1); put(1,1); put(0,1); put(1,0); containsValue(1).

Executing Java's ConcurrentHashMap on this shuffle yields the values null, null, null, 1, and true, respectively. To construct the generated outcome, Violat reorders the return values according to the textual order of their corresponding invocations in the given schema; since containsValue is second in this order, after get, the generated outcome is null, true, null, null, 1. Among the 10 possible shuffles of this schema, there are only four unique outcomes – shown later in Figs. 3 and 4.

**Fig. 3.** Code generated for the containsValue schema of Fig. 2 for Java Pathfinder. Code generation for jcstress similar, but conforms to the tool's idiomatic test format using decorators, and built-in thread and outcome management.

Note that in contrast to existing approaches based on *linearizability* [20], including Line-Up [4], which enumerate linearizations *per execution* of a given program, Violat only enumerates linearizations once *per schema*. This is made possible for two reasons. First, by considering simple test programs in which all invocations are known *statically*, we know the precise set of invocations (including argument values) to linearize even before executing the program. Second, according to sequential happens-before consistency [12], we consider the recording of real-time ordering among invocations infeasible on modern platforms like Java and C++11, which provide only weak ordering guarantees according to a platform-defined happens-before relation. This enables the static prediction of ordering constraints among invocations. While this static enumeration is also exponential in the number of invocations, it becomes an additive rather than multiplicative factor, amounting to significant performance gains in testing.


**Fig. 4.** Observed outcomes for the size method, recorded by Java Pathfinder and jcstress. Outcomes list return values in program-text order, e.g., get's return value is listed first.

#### **5 Code Generation and Back-End Integrations**

Once schemas are annotated with expected outcomes, the translation to actual test programs is fairly straightforward. Note that until this point, Violat is mainly agnostic to the underlying platform for which tests are being generated. The only exception is in computing the expected outcomes for schema linearizations, which executes the given concurrent object implementation as a stand-in oracle for its implicit ADT specification.

Figure 3 lists a simplification of the code generated for the containsValue schema of Fig. 2. The test program initializes a concurrent-object instance and a hash table of expected outcomes, then runs the schema's threads in parallel, recording the results of each invocation, and checks, after threads complete, whether the recorded outcome is expected. To avoid added inter-thread interference and the masking of potential weak-memory effects, each recorded result is isolated to a distinct cache line via Java's *contended* decorator. The actual generated code also includes exception handling, elided here for brevity.

Our current implementation of Violat integrates with two analysis back-ends: the Java Concurrency Stress testing tool [42] (jcstress) and Java Pathfinder [47]. Figure 4 demonstrates the results of each tool on the code generated from the containsValue schema of Fig. 2. Each tool observes executions with the 4 expected outcomes, as well as executions yielding an outcome that Violat does not predict, thus signaling a violation to observational refinement (and atomicity). Java Pathfinder explores 18 program paths in a few seconds – achieving exhaustiveness via partial-order reduction [16] – while jcstress explores nearly 4 million executions in 1 s, observing the unpredicted outcome only twice. Aside from this example, Violat has uncovered consistency violations in over 50 methods of Java's concurrent collections [9,10,12].

#### **6 Usage**

Violat is implemented as a Node.js command-line application, available from GitHub and npm.<sup>2</sup> Its basic functionality is provided by the command:

```
$ violat-validator ConcurrentHashMap.json
...
violation discovered
{ put(0,1); size(); contains(1) } || { put(0,0); put(1,1) }
outcome OK frequency
0, 0, true, null, null X 7
0, 1, true, null, null -
                        703
0, 2, true, null, null -
                        94,636
null, 1, false, 1, null -
                        2,263
null, 1, true, 1, null -
                        59,917
null, 2, true, 1, null -
                        4
...
```
reporting violations among 100 generated programs. User-provided classes, individual schemas, program limits, and particular back-ends can also be specified:

```
$ violat-validator MyConcurrentHashMap.json \
```
A full selection of parameters is available from the usage instructions:

```
$ violat-validator --help
```
#### **7 Related Work**

Terragni and Pezz`a survey several works on test generation for concurrent objects [45]. Like Violat, Ballerina [31] and ConTeGe [33] enumerate tests randomly, while ConSuite [43], AutoConTest [44], and CovCon [6] exploit static analysis to compute potential shared-memory access conflicts to reduce redundancy among generated tests. Similarly, Omen [35–38], Narada [40], Intruder [39], and Minion [41] reduce redundancy by anticipating potential concurrency faults during sequential execution. Ballerina [31] and ConTeGe [33] compute linearizations, but only identify generic faults like data races, deadlocks, and exceptions, being neither sound nor complete for testing observational refinement: fault-free executions with un-admitted return-value combinations are false negatives, while faulting executions with admitted return-value combinations are generally false positives – many non-blocking concurrent objects exhibit

<sup>2</sup> https://github.com/michael-emmi/violat.

data races by design. We consider the key innovations of these works, i.e., redundancy elimination, orthogonal and complementary to ours. While Pradel and Gross do consider subclass substitutability [34], they only consider programs with two concurrent invocations, and require exhaustive enumeration of the superclass's thread interleavings to calculate admitted outcomes. In contrast, Violat computes expected outcomes without interleaving method implementations, i.e., considering them atomic.

Others generate tests for memory consistency. TSOtool [17] generates random tests against the total-store order (TSO) model, while LCHECK [5] employs genetic algorithms. Mador-Haim et al. [26,27] generate litmus tests to distinguish several memory models, including TSO, partial-store order (PSO), relaxedmemory order (RMO), and sequential consistency (SC). CppMem [2] considers the C++ memory model, while Herd [1] considers release-acquire (RA) and Power in addition to the aforementioned models. McVerSi [8] employs genetic algorithms to enhance test coverage, while Wickerson et al. [48] leverage the Alloy model finder [22]. In some sense, these works generate tests of observational refinement for platforms implementing memory-system ADTs, i.e., with read and write operations, whereas Violat targets arbitrary ADTs, including collections with arbitrarily-rich sets of operations.

Violat more closely follows work on *linearizability* checking. Herlihy and Wing [20] established the soundness of linearizability for observational refinement, and Filipovic et al. [14] established completeness. Wing and Gong [49] developed a linearizability-checking algorithm, which was later adopted by Line-Up [4] and optimized by Lowe [24]; while Violat pays the exponential cost of enumerating linearizations once *per program*, these approaches pay that cost *per execution* – an exponential quantity itself. Gibbons and Korach [15] established NP-hardness of per-execution linearizability checking for arbitrary objects, while Emmi and Enea [11] demonstrate tractability for collections. Bouajjani et al. [3] propose polynomial-time approximations, and Emmi et al. [13] demonstrate efficient symbolic algorithms. Finally, Emmi and Enea [9,10,12] apply Violat to checking atomicity and weak-consistency of Java concurrent objects.

**Acknowledgement.** This work is supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant No. 678177).

#### **References**


terdam, The Netherlands, 30 October–4 November 2016, pp. 430–446. ACM (2016). https://doi.org/10.1145/2983990.2984040


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### Author Index

Albarghouthi, Aws I-278 André, Étienne I-520 Arcaini, Paolo I-401 Arcak, Murat I-591 Arechiga, Nikos II-137 Ashok, Pranav I-497 Avni, Guy I-630

Backes, John II-231 Bansal, Suguman I-60 Barbosa, Haniel II-74 Barrett, Clark I-443, II-23, II-74, II-116 Bayless, Sam II-231 Becker, Heiko II-155 Beckett, Ryan II-305 Beillahi, Sidi Mohamed II-286 Berkovits, Idan II-245 Biswas, Ranadeep II-324 Bloem, Roderick I-630 Bouajjani, Ahmed II-267, II-286 Brain, Martin II-116 Breck, Jason I-335 Busatto-Gaston, Damien I-572

Černý, Pavol I-140 Češka, Milan I-475 Chatterjee, Krishnendu I-630 Chen, Mingshuai I-650 Cimatti, Alessandro I-376 Coenen, Norine I-121 Cook, Byron II-231 Cyphert, John I-335

D'Antoni, Loris I-3, I-278, I-335 Damian, Andrei II-344 Darulova, Eva II-155, II-174 Davis, Jennifer A. I-366 Deshmukh, Jyotirmoy II-137 Dill, David L. I-443 Dimitrova, Rayna I-241 Dodge, Catherine II-231 Drăgoi, Cezara II-344

Dreossi, Tommaso I-432 Drews, Samuel I-278

Elfar, Mahmoud I-180 Emmi, Michael II-324, II-534 Enea, Constantin II-267, II-286, II-324, II-534 Ernst, Gidon II-208 Erradi, Mohammed II-267

Farzan, Azadeh I-200 Faymonville, Peter I-421 Fedyukovich, Grigory I-259 Feldman, Yotam M. Y. II-405 Feng, Shenghua I-650 Ferreira, Tiago I-3 Finkbeiner, Bernd I-121, I-241, I-421, I-609 Fränzle, Martin I-650 Fremont, Daniel J. I-432 Frohn, Florian II-426 Furbach, Florian I-355

Gacek, Andrew II-231 Ganesh, Vijay II-367 Gao, Sicun II-137 García Soto, Miriam I-297 Gastin, Paul I-41 Gavrilenko, Natalia I-355 Ghosh, Shromona I-432 Giannarakis, Nick II-305 Giesl, Jürgen II-426 Gomes, Victor B. F. I-387 Griggio, Alberto I-376 Guo, Xiaojie II-496 Gupta, Aarti I-259 Gurfinkel, Arie I-161, II-367

Hasuo, Ichiro I-401, I-520 Heljanko, Keijo I-355 Henzinger, Thomas A. I-297, I-630 Hong, Chih-Duo I-455 Hu, Alan J. II-231 Hu, Qinheping I-335

Huang, Derek A. I-443 Humphrey, Laura R. I-366 Hur, Chung-Kil II-445 Ibeling, Duligur I-443 Iosif, Radu II-43 Jagannathan, Suresh II-459 Jain, Mitesh I-553 Jon á š, Martin II-64 Julian, Kyle I-443 Kahsai, Temesghen II-231 Kang, Eunsuk I-219 Kapinski, James II-137 Katz, Guy I-443 Kim, Edward I-432 Kim, Eric S. I-591 Kincaid, Zachary II-97 Kingston, Derek B. I-366 Klein, Felix I-609 Kochenderfer, Mykel J. I-443 Kocik, Bill II-231 Kölbl, Martin I-79 Kong, Soonho II-137 Könighofer, Bettina I-630 Kotelnikov, Evgenii II-231 Křetínský, Jan I-475, I-497 Kukovec, Jure II-231 Lafortune, St éphane I-219 Lal, Akash II-386 Lange, Julien I-97 Lau, Stella I-387 Lazarus, Christopher I-443 Lazi ć, Marijana II-245 Lee, Juneyoung II-445 Lesourd, Maxime II-496 Leue, Stefan I-79 Li, Jianwen II-3 Li, Yangjia II-187 Lim, Rachel I-443 Lin, Anthony W. I-455 Liu, Junyi II-187 Liu, Mengqi II-496 Liu, Peizun II-386 Liu, Tao II-187 Lopes, Nuno P. II-445 Losa, Giuliano II-245

Madhukar, Kumar I-259 Madsen, Curtis I-540 Magnago, Enrico I-376 Mahajan, Ratul II-305 Majumdar, Rupak I-455 Manolios, Panagiotis I-553 Markey, Nicolas I-22 McLaughlin, Sean II-231 Memarian, Kayvan I-387 Meyer, Roland I-355 Militaru, Alexandru II-344 Millstein, Todd I-315 Monmege, Benjamin I-572 Mukherjee, Sayan I-41 Murray, Toby II-208 Myers, Chris J. I-540 Myreen, Magnus O. II-155

Nagar, Kartik II-459 Neupane, Thakur I-540 Niemetz, Aina II-116 Nori, Aditya I-315 Nötzli, Andres II-23 , II-74

Padhi, Saswat I-315 Padon, Oded II-245 Pajic, Miroslav I-180 Pichon-Pharabod, Jean I-387 Piskac, Ruzica I-609 Ponce-de-Le ón, Hern á n I-355 Prabhu, Sumanth I-259 Pranger, Stefan I-630 Preiner, Mathias II-116

Rabe, Markus N. II-84 Ravanbakhsh, Hadi I-432 Reed, Jason II-231 Reps, Thomas I-335 Reynier, Pierre-Alain I-572 Reynolds, Andrew II-23 , II-74 , II-116 Rieg, Lionel II-496 Roohi, Nima II-137 Roussanaly, Victor I-22 Roveri, Marco I-376 Rozier, Kristin Y. II-3 Rümmer, Philipp I-455 Rungta, Neha II-231

Sagiv, Mooly II-405 Sammartino, Matteo I-3 Sanán, David II-515 Sánchez, César I-121 Sankur, Ocan I-22, I-572 Santolucito, Mark I-609 Schilling, Christian I-297 Schledjewski, Malte I-421 Schwenger, Maximilian I-421 Seshia, Sanjit A. I-432, I-591 Sewell, Peter I-387 Shah, Parth I-443 Shao, Zhong II-496 Sharma, Rahul I-315 Shemer, Ron I-161 Shoham, Sharon I-161, II-245, II-405 Siegel, Stephen F. II-478 Silva, Alexandra I-3 Silverman, Jake II-97 Sizemore, John II-231 Solar-Lezama, Armando II-137 Srinivasan, Preethi II-231 Srivathsan, B. I-41 Stalzer, Mark II-231 Stenger, Marvin I-421 Strejček, Jan II-64 Subotić, Pavle II-231

Tatlock, Zachary II-155 Tentrup, Leander I-121, I-421 Thakoor, Shantanu I-443 Tinelli, Cesare II-23, II-74, II-116 Tizpaz-Niari, Saeid I-140 Tonetta, Stefano I-376 Torfah, Hazem I-241, I-421 Tripakis, Stavros I-219 Trivedi, Ashutosh I-140

Vandikas, Anthony I-200 Vardi, Moshe Y. I-60, II-3 Varming, Carsten II-231 Vazquez-Chanlatte, Marcell I-432 Vediramana Krishnan, Hari Govind II-367 Vizel, Yakir I-161, II-367 Volkova, Anastasia II-174

Waga, Masaki I-520 Wahl, Thomas II-386 Walker, David II-305 Wang, Shuling II-187 Wang, Yu I-180 Weininger, Maximilian I-497 Whaley, Blake II-231 Widder, Josef II-344 Wies, Thomas I-79 Wilcox, James R. II-405 Wu, Haoze I-443

Xu, Xiao II-43 Xue, Bai I-650

Ying, Mingsheng II-187 Ying, Shenggang II-187 Yoshida, Nobuko I-97

Zeleznik, Luka I-297 Zeljić, Aleksandar I-443 Zennou, Rachid II-267 Zhan, Bohua II-187 Zhan, Naijun I-650, II-187 Zhang, Zhen I-540 Zhang, Zhenya I-401 Zhao, Yongwang II-515 Zheng, Hao I-540